HIPAA Safe Harbor De-Identification: Detecting Hospital-Specific MRN Formats Without Engineering
HIPAA Safe Harbor de-identification requires removal of "medical record numbers" as one of its 18 identifier categories. This seems straightforward until you encounter the actual operational challenge: medical record numbers are not standardized.
Epic generates MRNs in one format. Cerner uses a different format. Meditech uses another. Hospital networks assign their own facility codes. Regional health information organizations create yet more formats. The result: a standard PII tool scanning a clinical document for "medical record numbers" has no way to know what format your institution uses — and will miss them entirely.
This isn't a hypothetical gap. Healthcare IT teams conducting HIPAA de-identification assessments regularly discover that MRNs in "de-identified" datasets are still present because the anonymization tool was configured only for standard PII categories.
The MRN Standardization Problem
US healthcare has no national standard for medical record number format. Each institution (or EHR vendor) defines its own:
Common patterns observed:
- Epic-style: 8-12 digit numeric (e.g., 123456789)
- Cerner-style: Hospital code prefix + numeric (e.g., MGH-987654)
- Regional networks: Facility code + year + sequence (e.g., HOSP-2023-456789)
- Veterans Affairs: 9-digit with specific check digit patterns
- Pediatric systems: Patient-type prefix + numeric (e.g., PED-12345678)
None of these match a universal "medical record number" regex pattern because no such universal pattern exists.
What standard PII tools detect: Standard implementations of HIPAA de-identification tools focus on the identifiers with standardized formats: SSNs (XXX-XX-XXXX), phone numbers (XXX-XXX-XXXX), email addresses, dates. MRNs, account numbers, and certificate/license numbers — HIPAA categories 8, 10, and 11 — are institution-specific and require custom configuration.
The Compliance Risk
A regional hospital network prepares to share de-identified patient data with a university research partner. Their EHR generates MRNs in the format: HOSP-YYYY-XXXXXX (hospital code, 4-digit year, 6-digit sequence number).
They run the dataset through their standard HIPAA de-identification tool. The tool removes:
- Patient names ✓
- Dates (beyond year) ✓
- Phone numbers ✓
- Email addresses ✓
- Geographic data smaller than state ✓
- SSNs ✓
The tool does not remove MRNs — because HOSP-2023-456789 doesn't match any built-in MRN pattern.
The researcher receives the dataset, runs a join against their internal records (which include MRNs from referrals at the same hospital), and can re-identify a significant percentage of the "de-identified" patients. The hospital network has a HIPAA breach.
This scenario is not hypothetical — it's a documented failure mode in de-identification workflows.
Custom Entity Creation: The Solution
The solution is to define the MRN format as a custom entity in the anonymization tool. The compliance officer (not an engineer) can:
-
Identify the institution's MRN format: "Hospital identifier starting with HOSP, then a dash, then a 4-digit year, then a dash, then a 6-digit number"
-
Use an AI pattern assistant to generate the appropriate regex: HOSP-d{4}-d{6}
-
Validate against a sample document: Upload 20 discharge summaries, verify the pattern catches all MRNs
-
Save as a custom entity: "Hospital MRN" — now available in all processing modes
-
Include in the HIPAA de-identification preset: The standard preset plus the custom MRN entity covers all 18 Safe Harbor categories for this institution
Timeline: 3 days of compliance officer time vs. 3 months of engineering ticket queue for custom code development.
Example: Regional Hospital Network Implementation
Organization: 15-facility regional hospital network MRN format: HOSP-YYYY-XXXXXX (appears in thousands of discharge summary PDFs) Compliance challenge: Preparing research dataset for university partner (HIPAA data use agreement executed, requires de-identification) Previous approach: External HIPAA de-identification vendor ($120,000/year) Gap discovered: Vendor tool did not detect institution-specific MRN format
New workflow:
- Compliance officer defines MRN pattern (20 minutes)
- AI assists with regex validation (5 minutes)
- Test against 50 sample discharge summaries (30 minutes)
- Confirm all MRNs detected, no false positives (10 minutes)
- Add to HIPAA de-identification preset alongside standard entities
- Process full 50,000-record research dataset in batch
Total time to close the compliance gap: 1 afternoon.
Multi-Facility Organizations: Different MRN Formats Per Facility
Hospital networks acquired through merger often have multiple EHR systems — and multiple MRN formats from legacy installations.
Handling multiple MRN formats:
Create separate custom entities for each format:
- "MRN Format A (Epic)" — 8-digit numeric
- "MRN Format B (legacy Cerner)" — prefix + 7-digit numeric
- "MRN Format C (acquired affiliate)" — state code + year + sequence
A preset that includes all three custom entities plus standard HIPAA identifiers covers the full network's de-identification requirements. When applied to a batch containing documents from any facility, all MRN formats are caught.
Beyond MRNs: Other Institution-Specific Identifiers
The same custom entity approach applies to other HIPAA Safe Harbor categories that organizations implement with non-standard formats:
Health plan beneficiary numbers (Category 9): Insurance member IDs are carrier-specific. Aetna, Blue Cross, United Healthcare all use different formats. A hospital system processing billing records needs custom patterns for each payer they work with.
Account numbers (Category 10): Hospital account numbers for billing (not clinical MRNs) are institution-specific.
Certificate/license numbers (Category 11): Physician DEA numbers have a standard format. State medical license numbers do not — each state licensing board uses a different format.
Device identifiers (Category 14): Medical device serial numbers are manufacturer-specific.
For each of these categories, custom entity creation allows compliance teams to close detection gaps without engineering resources.
Validation: Verifying Safe Harbor Compliance
HIPAA's Safe Harbor method requires that the covered entity "does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information."
For a compliance officer applying custom entity detection, validation is the demonstration that all 18 categories are covered:
- Process a sample of 50-100 documents from the research dataset
- Manually review the processed output — does anything look like a potential identifier?
- Run the output through a second detection pass (for any patterns that might have been missed)
- Document the validation process
The custom entity configuration, validation sampling results, and processing metadata together constitute the documentation record for Safe Harbor de-identification.
Conclusion
HIPAA Safe Harbor de-identification is not accomplished by standard PII tools configured for generic patterns. Medical record numbers — one of the 18 required categories — are institution-specific and require custom detection for compliance.
Custom entity creation closes this gap in hours rather than months. Compliance officers can define institution-specific patterns, validate against sample documents, and produce truly Safe Harbor-compliant output without engineering resources.
The compliance gap between "we ran a HIPAA de-identification tool" and "we actually removed all 18 Safe Harbor identifiers" is often just one unconfigured custom entity.
Sources: