Safe Harbor de-identification: remove all 18 HIPAA PHI identifiers – CCPA/HIPAA-compliant de-identification per 45 CFR §164.514(b)(2)

The HIPAA Safe Harbor method under 45 CFR §164.514(b)(2) requires covered entities to remove all 18 specified PHI identifier categories before a dataset is treated as de-identified. anonym.legal systematically detects and removes each category — from patient names and dates to device identifiers and IP addresses — producing a dataset that is no longer PHI under the Privacy Rule.

When this applies

Apply this workflow whenever a covered entity or business associate needs to de-identify a dataset containing PHI so that it falls outside the Privacy Rule's scope. Common use cases include sharing data with researchers, publishing quality-improvement findings, or supplying a vendor with training data when no IRB waiver or data-use agreement is in place.

  1. Upload the dataset (CSV, XLSX, HL7 FHIR JSON, PDF, or DOCX) to anonym.legal; the engine preserves record structure and clinical coding.
  2. The engine scans for all 18 identifier categories listed at §164.514(b)(2)(i): (1) names; (2) geographic subdivisions smaller than a state, including street address, city, county, ZIP code, and equivalent geocodes; (3) dates — except year — directly related to an individual, including birth date, admission date, discharge date, and date of death; (4) telephone numbers; (5) fax numbers; (6) email addresses; (7) Social Security Numbers; (8) medical record numbers; (9) health plan beneficiary numbers; (10) account numbers; (11) certificate or license numbers; (12) vehicle identifiers and serial numbers, including license plate numbers; (13) device identifiers and serial numbers; (14) web URLs; (15) IP addresses; (16) biometric identifiers, including finger and voice prints; (17) full-face photographs and any comparable images; and (18) any other unique identifying number, characteristic, or code.
  3. Each detected identifier is replaced with a consistent pseudonym, synthetic code, or redaction token depending on field type; date fields are generalized to year-only unless a longer date range is analytically necessary.
  4. ZIP codes with populations under 20,000 are generalized to the first three digits and then set to 000 per the §164.514(b)(2)(i)(B) rule — the engine applies this automatically.
  5. Ages over 89 are aggregated into a single '90 or older' category per the §164.514(b)(2)(i)(C) requirement.
  6. A de-identification certificate is generated listing every field transformed and the rule applied, supporting organizational documentation of the Safe Harbor compliance process.
  7. The de-identified file is released; the reversible mapping table (if pseudonymization rather than full redaction was chosen) is stored with US data residency and role-based access control.
  8. For ongoing pipelines, the configuration is saved as a reusable Safe Harbor profile so subsequent data extracts are processed consistently.

What you provide

  • Source dataset containing PHI (CSV, XLSX, HL7 FHIR JSON, PDF, or DOCX)
  • Field inventory or data dictionary identifying PHI-bearing columns
  • Specification of which quasi-identifiers — such as rare diagnosis codes or very small ZIP populations — require manual review

Limitations & cautions

  • Safe Harbor de-identification eliminates the regulatory definition of PHI but does not guarantee zero re-identification risk; datasets with very small cell sizes for rare diagnoses may retain residual re-identification risk and should be reviewed with Expert Determination methods under §164.514(b)(1).
  • The engine removes or transforms the 18 enumerated identifier categories but does not analyze whether the remaining content constitutes a 'unique identifying number, characteristic, or code' under the catch-all provision in §164.514(b)(2)(i)(R) — manual review is recommended for uncommon clinical narratives.
  • Dates of service retained as year-only may reduce utility for time-series analyses; agree on the date-generalization strategy with data consumers before processing.
  • The Safe Harbor method does not apply to psychotherapy notes, which are excluded from the Privacy Rule's definition of a 'designated record set' and cannot be de-identified through administrative means alone.

FAQ

Does removing all 18 identifiers guarantee that the dataset is no longer PHI?

Under 45 CFR §164.514(b)(2), a covered entity that removes all 18 categories 'has no actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information' satisfies the Safe Harbor standard. The data is then not PHI and the Privacy Rule no longer applies. However, Safe Harbor is a bright-line rule, not a statistical guarantee; for high-sensitivity datasets consider supplementing with Expert Determination under §164.514(b)(1).

Why does Safe Harbor treat ZIP codes and dates differently from other identifiers?

The rule recognizes that geographic and temporal quasi-identifiers can be highly re-identifying at fine granularity. Under §164.514(b)(2)(i)(B) and (C), ZIP codes for small geographic units and dates other than year must be generalized or removed. The engine handles both automatically, including the ZIP-to-000 rule for population areas below 20,000 persons.

What is the 'any other unique identifying number' catch-all at item (18)?

The catch-all at §164.514(b)(2)(i)(R) covers identifiers not specifically enumerated but that could uniquely identify an individual — for example, a patient portal login ID, a study participant code that maps back to the subject, or an insurance member number. The engine flags candidate fields matching common ID patterns for reviewer confirmation before final de-identification.

Can we use Safe Harbor de-identified data for AI model training without a Business Associate Agreement?

Once a dataset has been de-identified under the Safe Harbor standard it is no longer PHI and therefore not subject to the BAA requirement of 45 CFR §164.504(e). However, if the vendor performing the training will have access to PHI at any point — for example, to perform the de-identification itself — a BAA is required for that processing step. Confirm the data-flow boundary with legal counsel before transmitting.

Is Safe Harbor de-identification the same as anonymization under GDPR?

No. The HIPAA Safe Harbor method is a US federal regulatory standard defined in 45 CFR §164.514(b)(2); it sets specific enumerated criteria for US healthcare data. GDPR anonymization is a broader EU legal standard. Data de-identified under Safe Harbor for US purposes should receive a separate assessment if it is also subject to GDPR obligations.

Healthcare Records

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.