丹麦CPR：GDPR合规的模11校验

67%的NLP工具未能实现丹麦CPR号码的模11校验。Datatilsynet 2024年针对医疗领域的14项执法行动。健康数据的二次使用问题。

George CurtaMarch 6, 20267 分钟阅读

Denmark DatatilsynetCPR modulus-11Danish healthcare GDPRhealth data anonymizationNordic compliance

丹麦CPR号码：GDPR合规指南

2026年更新版

丹麦数据监管机构Datatilsynet于2024年作出31项GDPR决定，其中14项涉及医疗数据。这一高比例折射出两个现实：丹麦运营着规模庞大的国家医疗体系，而该体系中持续存在的技术漏洞使患者记录不断遭到泄露。

CPR号码的校验字符规则

CPR号码是丹麦的个人身份标识符，10位数字，格式为DDMMYY-XXXX。前六位为出生日期，后四位为序列码加校验字符。

校验字符采用模11规则：

取第1至9位数字。
为每位赋予权重：4、3、2、7、6、5、4、3、2。
将每位数字与其权重相乘，对所有结果求和。
除以11，记录余数。
余数为0 → 校验字符为0。
余数为1 → 该号码无效。
余数为2–10 → 校验字符为11减去该余数。

对于扫描CPR号码的工具而言，这一规则至关重要。某些DDMMYY-XXXX格式的字符串在逻辑上永远不可能有效，跳过此步骤的工具会将日期、发票编码和参考号误标记为真实身份标识符。

该机构2024年审查发现，67%的通用NLP工具跳过了此校验，这是其医疗案例中最主要的技术失效点。

丹麦的五大健康登记系统

丹麦通过五大国家登记系统关联医疗数据，个人身份标识符将五者串联起来：

医院出院记录（1977年起）
处方数据（1995年起）
癌症登记（1943年起）
死因登记（1970年起）
初级保健诊断（1990年起）

这使丹麦的医学研究具有极高价值，同时也带来了风险。仅删除原始标识符是不够的——仍保留年龄、性别、诊断和年份的数据集可能重新暴露人员信息，尤其是罕见病患者。

Datatilsynet 2024年关于医疗数据二次使用的指导意见设定了三项要求。

书面记录数据处理步骤： 列明删除了哪些字段、对哪些字段进行了舍入或分组处理，以及输出结果达到的分组规模。政策说明不符合此标准。

大型数据集须进行外部审查： 对于超过5,000人的数据集，该机构建议对去标识化步骤进行独立技术审查。

数据须与研究目标相匹配： 数据集须符合既定研究目标。该机构发现存在使用完整国家登记系统时实际上较小样本即可满足需求的情况。

有关校验字符规则在其他欧洲ID格式中的应用，请参阅我们的欧盟国家ID检测指南。

2024年案例的共同发现

14起医疗案例呈现三类常见失效模式。

研究数据共享： 一家医院向学术合作方发送去标识化患者数据集用于AI训练，数据集包含部分出生日期、诊断代码和治疗日期。该机构认定这一组合可重新暴露罕见病患者——特殊诊断迅速缩小了目标范围。

第三方AI服务： 一家健康科技公司将患者病历发送给美国AI服务商进行临床记录处理，病历中的个人标识符未事先删除，且未建立有效的数据传输机制。

OCR流程漏洞： 一家保险机构处理残疾理赔的扫描PDF表单，OCR工具将图像转换为文本，但未对输出结果执行校验字符测试，大量标识符因此被遗漏。

OCR过程常在号码中间插入空格或移动连字符，简单的模式匹配在此类输出上失效。检测必须能处理OCR文本，而不仅仅是整洁的输入。有关扫描文档处理步骤，请参阅我们的OCR医疗检测指南。

三项技术必备要素

以下三个要素构成丹麦医疗GDPR合规的基础。

对所有文本执行校验字符测试： 对每个候选字符串执行完整的模11校验，对整洁文本和OCR输出均须适用。

丹麦语姓名检测： 使用基于丹麦语文本训练的模型，spaCy的da_core_news模型是可选方案之一。通用英语模型会遗漏丹麦语姓名和机构名称。

去标识化记录： 书面记录删除内容、分组处理情况及输出分组规模，该机构要求以技术文档而非政策说明的形式呈现。

有关医疗数据安全事件成本的数据，请参阅我们的医疗数据泄露成本分析。

参考来源

GDPR 与合规

自托管 PII 工具为何无法通过合规审计

spaCy 3.4.4 与 spaCy 3.5.1 会产生不同的 NER 结果。某金融机构发现，同一份文档在预发布环境与生产环境中的匿名化结果存在 3% 的差异。

GDPR 与合规

Presidio遗漏220余种GDPR实体：欧盟覆盖缺口

Presidio默认附带约40个实体识别器，主要面向美国标识符。欧洲组织需要识别IBAN、Codice Fiscale等220余种欧盟实体，而Presidio默认不支持这些类型。

GDPR 与合规

配置漂移：隐藏的GDPR合规风险

分析师A将姓名替换为假名，分析师B直接遮黑处理。GDPR审计在同一数据集中发现了两种方式并存。配置漂移——团队成员各自为政的配置差异——会直接触发审计问题并导致合规罚款。

准备好保护您的数据了吗？

开始使用 285 种实体类型在 48 种语言中匿名化 PII。

开始免费试用查看功能

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

We follow these rules

GDPR (EU 2016/679).
ISO/IEC 27001:2022.
NIS2 (EU 2022/2555).
HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our company HQ is in Saarbrücken, Germany. Our servers run in Hetzner's Falkenstein datacenter.

Hetzner holds ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

We never sell your information to third parties.
We never train models on what you upload.
We never keep your work after you delete it.
We never share keys with any outside firm.
We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.

丹麦CPR：GDPR合规的模11校验

丹麦CPR号码：GDPR合规指南

CPR号码的校验字符规则

丹麦的五大健康登记系统

2024年案例的共同发现

三项技术必备要素

参考来源

相关文章

自托管 PII 工具为何无法通过合规审计

Presidio遗漏220余种GDPR实体：欧盟覆盖缺口

配置漂移：隐藏的GDPR合规风险

准备好保护您的数据了吗？

About this page

Related reading

We follow these rules

Our promise

Where we run

Need help?

How we test

What we never do

Plans in plain words

Who built this

Where to start

How the parts fit

Words from our team

Common questions we hear

A short tour of the workflow