By · Last updated 2026-06-15

企业NLP对正则表达式

anonym.legal vs Caviard.ai

Caviard.ai是一个Chrome扩展,使用正则表达式模式进行PII检测,实现60-75%的召回率和15-30%的假阳性率 – 对于受规管的合规工作来说是不够的。anonym.legal的3层NLP引擎在网络、桌面、Office Add-in和所有浏览器上跨48种语言提供92-98%的召回率,具有确定性、可审计的结果。

了解更多关于 Caviard.ai

功能对比

功能anonym.legalCaviard.ai
检测技术Yes仅正则表达式模式
实体类型285+约30-50个模式
语言支持48 languages受限(非ASCII正则表达式差距)
平台支持Yes仅Chrome扩展
按实体置信度评分Yes
确定性结果Yes仅基于模式
召回率Yes60-75%
假阳性率Yes15-30%
ISO 27001Yes未记载
合规审计追踪Yes
可逆加密AES-256-GCM否(本地浏览器处理)
Office Add-inYes
定价Free to €29/mo未公布

本对比基于公开可获得的信息。“未找到”表示产品页面未有相关功能说明。最后更新于2026年2月。

为什么选择anonym.legal

所有浏览器+桌面 – 不仅限Chrome

anonym.legal在Chrome、Firefox、Edge、Safari和桌面应用中工作。Caviard.ai是Chrome扩展 – 使用其他浏览器的员工没有保护。

确定性NLP对正则表达式模式

anonym.legal使用3层NLP(Presidio + spaCy + XLM-RoBERTa变换器)。正则表达式无法理解上下文:它遗漏位置实体,将公司名称与文本混淆,在所有非ASCII脚本上失败。

ISO 27001认证基础设施

anonym.legal在具有ISO 27001认证的德国Hetzner上运行。Caviard.ai没有记录的安全认证。

48种语言对正则表达式差距

基于正则表达式的检测在德国变音符号、阿拉伯语、中文、希伯来语和其他非ASCII字符上失败。anonym.legal的多语言NLP本地覆盖48种语言。

按实体置信度评分

每个检测都包括0-100%的置信度评分和触发它的规则/模型 – 法律可防御性和HIPAA审计追踪所需的。Caviard.ai不提供置信度评分。

285+实体类型

具有校验和验证的国家特定ID、48语言NER、医疗记录号、金融标识符。Caviard.ai覆盖约30-50个正则表达式模式。

anonym.legal是正确选择的时候

anonym.legal优于Caviard.ai的时候:

  • 您需要合规级别的召回率(92-98%)而不是基本的模式匹配(60-75%)
  • 您的团队使用Firefox、Edge、Safari或桌面应用 – 不仅是Chrome
  • 您处理多语言内容:德语、法语、阿拉伯语、中文、希伯来语或48种语言中的任何一种
  • 您需要按实体置信度评分和审计追踪用于HIPAA、GDPR或电子取证
  • 您需要可逆匿名化 – 在法律要求时解密占位符

常见问题解答

基于正则表达式和基于NLP的PII检测有什么区别?

正则表达式模式匹配固定的文本结构(例如SSN格式)。他们遗漏上下文相关的PII:句子中的名称、位置实体以及任何模式略有变化的内容。NLP模型理解语言上下文 – anonym.legal的3层管道(Presidio + spaCy + XLM-RoBERTa)对比Caviard.ai等仅正则表达式的工具实现92-98%召回率对60-75%。

Caviard.ai在Firefox、Edge或Safari中工作吗?

否。Caviard.ai是Chrome扩展,仅在基于Chrome的浏览器中工作。anonym.legal通过网络应用在所有主要浏览器中工作,提供Chrome和Edge专用扩展,并包括Windows、macOS和Linux的独立桌面应用。

Caviard.ai有哪些安全认证?

Caviard.ai不公布ISO 27001或SOC 2认证。anonym.legal在具有ISO 27001认证的德国Hetzner基础设施上运行,具有符合GDPR的数据处理协议和由独立安全审计验证的零知识身份验证。

anonym.legal如何处理Caviard.ai遗漏的多语言PII?

正则表达式在非ASCII字符上失败:德国变音符号(ä、ö、ü)、阿拉伯文字、中文字符、希伯来文字。anonym.legal的NLP模型在48种语言上训练,处理字符规范化、Unicode边界和语言特定的ID格式(德国Personalausweis、法国NIR、阿拉伯国民ID等)。

我可以期望什么假阳性率?

Caviard.ai的正则表达式方法产生15-30%的假阳性率 – 将非PII文本标记为敏感信息。这导致对合法内容的不必要的删除。anonym.legal的NLP管道通过上下文理解、置信度评分阈值和按实体覆盖控制将假阳性降低到5%以下。

anonym.legal是否提供合规审计追踪?

是的。每个检测都包括实体类型、置信度评分、检测方法(规则ID或模型名称)和时间戳 – 为HIPAA、GDPR和电子取证要求创建可防御的审计追踪。Caviard.ai不提供按检测的审计追踪。

企业NLP PII检测

92-98%召回率。48种语言。所有浏览器+桌面。ISO 27001。免费开始。

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.