By · Last updated 2026-06-05

返回博客GDPR 与合规

巴西LGPD:CPF、CNPJ与数据保护

LGPD覆盖巴西2.15亿人口,ANPD于2024年开始重大执法行动。英语训练工具对CPF的识别准确率仅为45%。

June 5, 20268 分钟阅读
Brazil LGPDCPF detectionBrazilian Portuguese PIIANPD complianceSouth America data protection

巴西LGPD:CPF、CNPJ与数据保护

巴西《通用数据保护法》(Lei Geral de Proteção de Dados,LGPD)覆盖2.15亿人口,是全球按人口计算的第三大数据保护法规,覆盖人口超过德国、法国和英国的总和。「国家数据保护局」(Autoridade Nacional de Proteção de Dados,ANPD)于2024年开出首批重大罚款,LGPD自2020年颁布以来的过渡期已经结束。

此外还存在技术挑战:LGPD文件以巴西葡萄牙语撰写,巴西的国家标识符与葡萄牙及其他国家的标识符均不相同。

为何巴西个人信息具有特殊性

巴西联邦和州级身份识别体系与欧洲数字身份体系发展路径不同,形成了一套独特的标识符。大多数NLP工具基于英语或欧洲数据训练,无法检测本地标识符。

CPF(个人纳税人登记号): 11位纳税人编号,格式为XXX.XXX.XXX-XX,含两位校验字符,算法分两步计算,两步均须通过方为有效。

检测差距显著。英语训练的NLP工具对CPF的识别准确率仅为45%(ANPD,2024年)。原因有二:一是仅匹配11位数字而未执行两步校验逻辑的工具会将有效CPF与随机数字序列混淆;二是CPF有时不带XXX.XXX.XXX-XX格式,这在OCR输出和纯文本表单中尤为常见。

CNPJ(企业国家登记号): 14位企业标识符,格式为XX.XXX.XXX/XXXX-XX,同样含两位校验字符,算法与CPF相似但不同。

RG(通用登记证): 州级民事身份证,格式因州而异。圣保罗州为2个字母加5至9位数字,里约热内卢州为7至8位数字加连字符,米纳斯吉拉斯州为7至9位数字,其他州有各自格式。仅了解一个州的RG格式的工具将遗漏大多数RG号码。

CNH(国家驾驶执照): 11位驾照号码,含一位校验字符及区域代码。

选民证(Título de Eleitor): 12位选民身份号码,由8位标识码、2位州代码和2位校验字符组成。

SUS号码(统一医疗系统卡): 15位公共医疗标识符,全国每人一号,出现在所有医院和诊所记录中。

PIS/PASEP: 11位社会保障项目编号,出现在所有劳动用工记录中。

LGPD匿名化标准

LGPD第12条定义了匿名数据的标准:数据「在处理时考虑合理技术手段后无法被识别」。这是一个随技术发展而变化的标准,今日的匿名数据未必能随着重识别技术的进步而持续保持匿名。

ANPD进一步提供了指导意见:仅删除CPF和姓名等直接标识符是不够的,准标识符的组合仍可能实现重识别。年龄段、城市、性别和职业的组合可能足以定位到个人,必须通过分组或加噪处理来应对。

对于AI训练数据,ANPD要求满足以下三个条件之一:一是数据满足第12条标准;二是每位数据主体对特定训练用途给予明确同意;三是存在有效记录的处理目的。

葡萄牙语的语言要求

巴西葡萄牙语与欧洲葡萄牙语在词汇、拼写和文件格式上均存在差异。基于葡萄牙(欧洲)文本训练的NLP模型,其准确率约为基于本地文本训练模型的71%,这一数据来自ANPD技术评估报告。

个人信息检测的关键差异包括:

  • 姓名: 双姓使用习惯和姓名顺序与葡萄牙不同。
  • 地址: CEP邮政编码格式为XXXXX-XXX,该格式为巴西独有,需要专门的检测逻辑。
  • 文件术语: 巴西使用「Carteira de Identidade」,葡萄牙使用「Bilhete de Identidade」,政府机构名称也有所不同。

ANPD合规的技术要求

四项技术要求涵盖ANPD合规所需:CPF和CNPJ检测须包含两步校验字符验证;RG检测须覆盖所有州;同时还需检测SUS号码和选民证;NLP模型须基于本地葡萄牙语文本训练。

请参阅我们的全球个人信息标识符检测指南2024年LGPD执法行动指南

参考来源

准备好保护您的数据了吗?

开始使用 285 种实体类型在 48 种语言中匿名化 PII。

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.