返回博客医疗保健

本地批量处理5万份临床记录：HIPAA合规指南

2026年2月南纽约联邦地区法院裁定，未经匿名化处理便通过AI处理的文件将丧失律师-委托人特权。

George CurtaApril 11, 20268 分钟阅读

batch PHI de-identificationclinical notes processingHIPAA local processingresearch dataset complianceIRB requirements

本地运行5万份临床记录：HIPAA合规指南

需要对大量病历档案进行去识别化处理的研究团队，普遍面临同一个困境：云端工具往往难以应对如此体量，许多法规要求数据必须在本地处理，而人工审核又耗时过长。本地批量处理是唯一的出路。

本指南涵盖关键法规要求、系统配置方法以及所需留存的记录。

请参阅我们的合规概览和安全实践，了解我们对HIPAA的支持方式。

为何云端方案行不通

HIPAA专家认定法（Expert Determination Method）设定了明确标准：去识别化数据被重新识别的风险必须「极低」，且须由有资质的专业人员进行验证。批准使用去识别化患者数据开展研究的机构审查委员会（IRB）同样需要留存记录，您必须记录所采用的方法、已移除的实体类型以及所实施的质量检查。

这一记录要求至关重要。去识别化处理不能是「黑箱」操作，您必须能够说明发现了什么、移除了什么、如何核验处理结果。

将50万个文件上传至云端API，速度慢、成本高，频率限制和漫长的传输时间使其几乎难以实现。对于大型研究数据集而言，云端处理方案鲜有实操可行性。

HIPAA还带来第二层顾虑：将受保护健康信息（PHI）发送给业务伙伴（Business Associate）——即便是去识别化服务商——也需要签署业务伙伴协议（BAA）。在IRB研究场景中，BAA规则可能与IRB数据使用条款相互交织，通常需要法律审查。本地处理则从根本上消除了数据传输的顾虑。

为何特权豁免裁定至关重要

2026年2月，纽约南区联邦地区法院裁定：未经匿名化处理便通过AI处理的文件将丧失律师-委托人特权。法院认定，将享有特权的文件发送至外部AI服务构成「披露行为」，而该披露行为使被分析内容的特权保护归于消灭。

医疗场景中的类比一目了然：发送至云端自然语言处理工具的医生诊疗记录、发送至外部AI服务的心理治疗记录，均面临类似风险。本地处理——文件始终不离开本机——可从根本上规避这一风险。

请参阅我们的HIPAA云端与零知识PHI保护指南，了解数据本地化的更多方法。

如何配置5万份记录的处理环境

批次大小： 桌面应用程序根据您的套餐，每批可处理1至5,000个文件。分10批、每批5,000份，即可在一个夜间任务中完成全部5万份记录，无需任何人工干预。

处理速度： 同时运行1至5个文件可提升整体吞吐量，单个夜间任务即可无额外操作地完成全部处理。

实体类型： 医疗特定实体类型包括病历号（MRN）格式、国家提供者识别码（NPI）、美国缉毒局（DEA）编号、健康计划ID以及HIPAA日期格式。在命名预设中配置一次，即可应用于每个批次，确保所有文件的去识别化处理保持一致。

审计日志： 每个批次任务自动导出CSV或JSON文件，记录文件名、检测到的实体类型、置信度分数和时间戳。该日志满足IRB专家认定的记录要求，可证明每个文件中发现和移除了哪些内容。

IRB记录清单

提交IRB方案前，请确认您能够提供以下证明材料：

去识别化工具的名称和版本
预设中包含的完整实体类型列表
在保留样本上的测试结果
每次批量运行的处理日志（文件名、实体计数、时间戳）
证明PHI未离开本地环境的记录

本地批量处理使上述每一项都易于提供：日志自动生成，预设已保存并可追溯版本，本地边界清晰明确。

数据来源

医疗保健

准备好保护您的数据了吗？

开始使用 285 种实体类型在 48 种语言中匿名化 PII。

开始免费试用查看功能

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

We follow these rules

GDPR (EU 2016/679).
ISO/IEC 27001:2022.
NIS2 (EU 2022/2555).
HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our company HQ is in Saarbrücken, Germany. Our servers run in Hetzner's Falkenstein datacenter.

Hetzner holds ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

We never sell your information to third parties.
We never train models on what you upload.
We never keep your work after you delete it.
We never share keys with any outside firm.
We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.

本地批量处理5万份临床记录：HIPAA合规指南

本地运行5万份临床记录：HIPAA合规指南

为何云端方案行不通

为何特权豁免裁定至关重要

如何配置5万份记录的处理环境

IRB记录清单

数据来源

相关文章

HIPAA MRN Detection Without a Regex PhD

HIPAA: Hospital-Specific MRN Detection

HIPAA Safe Harbor De-ID at Scale

准备好保护您的数据了吗？

本地批量处理5万份临床记录：HIPAA合规指南

本地运行5万份临床记录：HIPAA合规指南

为何云端方案行不通

为何特权豁免裁定至关重要

如何配置5万份记录的处理环境

IRB记录清单

数据来源

相关文章

HIPAA MRN Detection Without a Regex PhD

HIPAA: Hospital-Specific MRN Detection

HIPAA Safe Harbor De-ID at Scale

准备好保护您的数据了吗？

About this page

Related reading

We follow these rules

Our promise

Where we run

Need help?

How we test

What we never do

Plans in plain words

Who built this

Where to start

How the parts fit

Words from our team

Common questions we hear

A short tour of the workflow