大型语言模型遗漏了 50% 的临床 PHI

2025 年一项研究发现，在多语言文档中，LLM 工具遗漏了超过 50% 的临床受保护健康信息（PHI）。34.8% 的 ChatGPT 输入内容包含敏感数据。

George CurtaApril 2, 20269 分钟阅读

LLM PHI detectionHIPAA de-identificationclinical NLPSafe Harbor methodhealthcare AI compliance

50% 漏检率问题

2025 年的一项综述研究（arXiv:2509.14464）对 LLM 工具进行了临床记录测试，结果令人担忧：这些工具在多语言文档中遗漏了超过 50% 的临床 PHI。原因直接明了：LLM 本质上是为文本输出设计的，并非为 HIPAA 所要求的高召回率检测任务而构建。

HIPAA Safe Harbor 方法列出了 18 类受保护标识符：姓名、日期、电话号码、社会安全号码、病历号、健康计划 ID、设备 ID 和 IP 地址，每种类型都需要独立的检测逻辑。

临床记录的特点使这一任务更加复杂。以下面这句话为例：「Pt. John D., DOB 4/12/67, MRN 1234567, admitted 03/15/24, Dr. Smith ordered ECG.」一个句子包含五个受保护标识符，且大量使用了缩写。一个以理解临床含义为目标的模型，在执行检测任务时往往会失败。

LLM 的漏检模式与成因

LLM 工具在临床记录上的失败有规律可循。

缩写标识符： 临床记录大量使用简写形式，如 DOB、MRN、Pt.。以理解临床含义为目标调优的模型可能不会将「Pt. John D.」识别为姓名——敏感数据提取需要完全不同的目标导向。

依赖上下文的日期： 并非所有日期都具有相同的风险等级。「Age 67」是软性标识，「DOB 4/12/67」是直接的受保护标识符，而「03/15/24」作为入院日期同样受到保护。仅靠模式匹配远远不够。

非美国格式： Cyberhaven Q4 2025 报告显示，34.8% 的 ChatGPT 输入包含敏感数据，其中包括多语言 PII。在医疗场景下，这意味着非美国记录 ID、不同地区的日期格式以及本地健康 ID 类型——基于美国数据训练的工具会系统性地遗漏这些内容。

医院自定义标识符： 各医院使用其特有的病历号格式、员工 ID 和院区代码，这些信息不在标准 NER 训练数据中。不支持自定义实体的工具将无从识别。

研究数据集的合规风险

一家正在从 50 万份病历中构建研究数据集的医院面临切实的合规挑战。HIPAA 对去标识化数据要求「极低风险」标准，而一款遗漏了半数受保护标识符的工具根本无法达到这一要求。

研究档案并非整洁的结构化数据——记录跨越多个科室、不同时间段，有时还涵盖多种语言。一款在账单数据上表现良好的工具，在叙述性病历中可能完全失效；自由文本中的敏感数据没有字段标签可以依赖。

机构审查委员会（IRB）的批准增加了更多要求：机构必须说明所使用的方法、已删除的标识符类型以及执行的核查程序。漏检率达 50% 的工具无法满足这些要求。

请参阅我们的合规概览和安全实践说明，了解 anonym.legal 如何支持 HIPAA 相关工作。

三层检测方案

2025 年的综述研究发现了一个清晰规律：漏检率最低的工具均采用了三层检测架构。

第一层——正则表达式： 检测结构化标识符——社会安全号码、病历号、电话号码、健康计划 ID，对固定格式可靠有效。

第二层——NER： 使用 Transformer 模型，检测叙述性文本中的姓名、日期和敏感数据，能够胜任正则表达式无能为力的场景。

第三层——自定义实体： 处理特定机构的格式——专有病历号模式、员工 ID、院区代码。没有任何标准模型能够覆盖这些内容。

纯机器学习工具在缩写和非英语文本上表现退化；纯正则表达式工具对无字段标签的敏感数据束手无策。两者单独使用均不足以应对实际需求。

研究发现，只有三层设计方案能够将漏检率控制在 5% 以下，这正是 HIPAA Safe Harbor 合规的基本要求。

请参阅我们的HIPAA Safe Harbor 研究去标识化指南了解后续步骤。

参考来源

医疗保健

准备好保护您的数据了吗？

开始使用 285 种实体类型在 48 种语言中匿名化 PII。

开始免费试用查看功能

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

We follow these rules

GDPR (EU 2016/679).
ISO/IEC 27001:2022.
NIS2 (EU 2022/2555).
HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our company HQ is in Saarbrücken, Germany. Our servers run in Hetzner's Falkenstein datacenter.

Hetzner holds ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

We never sell your information to third parties.
We never train models on what you upload.
We never keep your work after you delete it.
We never share keys with any outside firm.
We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.

大型语言模型遗漏了 50% 的临床 PHI

50% 漏检率问题

LLM 的漏检模式与成因

研究数据集的合规风险

三层检测方案

参考来源

相关文章

HIPAA MRN Detection Without a Regex PhD

HIPAA: Hospital-Specific MRN Detection

HIPAA Safe Harbor De-ID at Scale

准备好保护您的数据了吗？

大型语言模型遗漏了 50% 的临床 PHI

50% 漏检率问题

LLM 的漏检模式与成因

研究数据集的合规风险

三层检测方案

参考来源

相关文章

HIPAA MRN Detection Without a Regex PhD

HIPAA: Hospital-Specific MRN Detection

HIPAA Safe Harbor De-ID at Scale

准备好保护您的数据了吗？

About this page

Related reading

We follow these rules

Our promise

Where we run

Need help?

How we test

What we never do

Plans in plain words

Who built this

Where to start

How the parts fit

Words from our team

Common questions we hear

A short tour of the workflow