By · Last updated 2026-06-04

Späť na blogZdravotná Starostlivosť

Detekcia HIPAA MRN bez pokrocilej znalosti regexu

Format MRN kazdeho nemocnice je odlisny. Memorial pouziva MRN:XXXXXXX, St. Mary's pouziva PT-YYYYY, University Hospital pouziva UHN-XXXXXXXXXX.

June 4, 20266 min čítania
HIPAA de-identificationMRN patternhealthcare ITAI pattern generationPHI detection

Detekcia HIPAA MRN bez pokrocilej znalosti regexu

Format MRN vasej nemocnice nie je v ziadnom standardnom nastroji na PII. Tu je postup, ako ho pridat za pat minut. Bez kodovania.

IT timy v zdravotnictve celia problemu HIPAA, ktory ine sektory nemaju. ID, ktore potrebuju najst najcastejsie - cislo zdravotneho zaznamu (MRN) - si urcuje sama nemocnica. Neexistuje ziadny narodny standard.

Kazdy projekt HIPAA de-identifikacie vyzaduje vlastne nastavenie. Bez neho preniknaju MRN cez "de-identifikovane" subory nezistene.

Problem s MRN v sietach zdravotnickych zariadeni

Nemocnicne siete, ktore vznikli fuziami, maju starsie systemy EHR. Kazdy system ma vlastny format MRN:

  • Memorial Hospital (Epic): MRN:XXXXXXX - 7-miestne cislo s predponou
  • St. Mary's (Cerner): PT-YYYYY - 5-miestne s predponou pacienta
  • University Hospital (Meditech): UHN-XXXXXXXXXX - 10-znakova kombinacia
  • Klinika (samostatny EMR): C\d{5} - pismeno C plus 5 cislic

HIPAA Safe Harbor vyzaduje odstranenie vsetkych 18 typov ID. Kategoria 8 su cisla zdravotnickych zaznamov. Nastroj, ktory nespozanie vase format, ich prehliadne. Subor vyzera cisto. Nie je.

Komunita ServiceNow pre zdravotnictvo poukaze na tento presny problem. Standardne nastroje zachytia cisla socialneho poistenia a telefonne cisla. MRN zariadenia im vsak vzdycky uniknuc.

Bariéra regexu

Pridanie vlastnych pravidiel do Microsoft Presidio - open-source zakladu pre mnohé nastroje HIPAA - vyzaduje realne znalosti:

  • Musíte poznaz triedu PatternRecognizer
  • Musíte pisat regex v syntaxi Python
  • Musíte nastavovat subory YAML config
  • Musíte ladit skore dovernosi
  • Musíte testovat a ladit skripty Python

Zodpovedny za suladenosta, ktory pozna format MRN, to sam nezvladne. Oprava skonci ako technicka poziadavka. Caka sa 6 az 8 tyzdnov. Medzera zostava otvorena.

Generovanie vzoru pomocou AI

Existuje rychlejsi sposob. Popiste vzor beznym jazykom. Ziskajte funkcionalny regex.

Kroky:

  1. Otvorte tvorcu vlastnych entit
  2. Zadajte priklady: "Nase MRN vypadata takto: MRN:1234567, MRN:9876543, MRN:0001234"
  3. AI vytvori pravidlo: MRN:\d{7}
  4. Testujte na 10 vzorkowych zaznamoch
  5. Vsetky MRN najdene? Ulozit a nasadit.

Pre siet so styrmi formatmi MRN:

  • Memorial Hospital -> MRN:\d{7}
  • St. Mary's -> PT-\d{5}
  • University Hospital -> UHN-[A-Z0-9]{10}
  • Klinika -> C\d{5}

Vytvorte styri vlastne entity. Zoskupte ich do predvolby. Spustite na vsetkych suboroch. Cas: jedno popoludnie.

Pozrite si vlastnu detekciu MRN v pipelinoch HIPAA bez kodovanie pre uplny navod.

Overovanie pre Safe Harbor

HIPAA Safe Harbor uvadza, ze kryta entita nesmie mat "skutocne znalosti", ze by udaje mohli niekoho identifikovat. (45 CFR §164.514(b))

Overovanie ukazuje, ze vase vlastne pravidla pokryvaju vsetkych 18 typov ID.

Krok 1: Ziskajte vzorky. Ziskajte 100 zaznamov z kazdeho pracoviska. Zmieste casove obdobia a oddelenia.

Krok 2: Spustite detekciu. Spracujte vsetkych 400 dokumentov s vasimi vlastnymi pravidlami.

Krok 3: Ludska kontrola. Zkontrolujte 20 dokumentov rucne (5% vzorka). Hladajte prehliadnute MRN a falsosne zhody.

Krok 4: Zdokonalte pravidla. Prehliadnute MRN? Rozsirte vzor. Prilis vela falsosnych zhod? Pridajte hranice slov.

Krok 5: Zaznamenajte to. Zaznamenajte pravidlo, velkost vzorky, vysledky a datum. Tento zaznam je vas rekord Safe Harbor.

Pozrite si vysvetlitelnu redakciu a auditne zaznamy HIPAA pre viac informacii o tom, co dokumentovat.

Uplne pokrytie Safe Harbor

Po oprave detekcie MRN skontrolujte vsetkych 18 kategorii.

KategoriaStandardne nastrojePotrebne vlastne nastavenie?
1. MenaModel NERNie
2. Geograficke udajeDetekcia polohyNie pre stat; Ano pre kody pracovisk
3. DatumyDetekcia datumovNie
4. Telefonne cislaDetekcia telefonuNie
5. Faxove cislaDetekcia telefonuNie
6. E-mailove adresyDetekcia e-mailuNie
7. Cisla soc. poisteniaDetekcia SSNNie
8. Cisla zdravotnickych zaznamovNie je vstavaneAno - specificke pre pracovisko
9. Cisla clenov zdravotneho planuCiastocneCasto ano - specificke pre platcu
10. Cisla uctovCiastocneCasto ano - format faktur
11. Cisla licenciiCiastocneCasto ano - specificke pre stat
12. ID vozidielCiastocneZriedkave v klinickych dokumentoch
13. ID zariadeniCiastocneAno ak su zariadenia v zaznamoch
14. Webove URLDetekcia URLNie
15. IP adresyDetekcia IPNie
16. Biometricke IDTextovy kontextZriedkave v prepustacich poznámkach
17. FotografieIba obrazkyMimo rozsahu pre text
18. Ine jedinecne IDNie je vstavaneAno - specificke pre pracovisko

Pre klinicky text kategorie 8, 9, 10 a 18 najcastejsie vyzaduju vlastne nastavenie.

Kontext klinickych dokumentov

Prepustacne poznamky, klinicke poznamky a operacne spravy su hlavne subory zdielane pre vyskum. Obsahuju:

  • MRN v hlavickach a paticach
  • Cisla uctov vo fakturacnych sekciach
  • Datumy vsetkych udalosti - prijatie, vykon, laboratorium, liek
  • Mena lekarov a cisla DEA
  • Informacie o odosielajucom lekarovi
  • Identifikatory poistenych

Vlastne pravidla pre formaty specificke pre pracovisko v kombinacii so vstavanymi pravidlami pre standardne formaty vam poskytnu uplne pokrytie Safe Harbor.

Zaver

HIPAA de-identifikacia bez vlastnych pravidiel nie je de-identifikacia podla Safe Harbor. Format MRN kazdeho zaradenia je jedinecny. Standardne nastroje ich prehliadnu. Medzera v sulade je realna a zostava otvorena, kym ju neuzavrete.

Generovanie vzoru pomocou AI skracuje opravu z 6 az 8 tyzdnov inzinierskiej prace na jedno popoludnie prace v oblasti suladenosta. Popizte format. Otestujte ho na realnych zaznamoch. Nasadte ho. Hotovo.

Zdroje

Pripravení chrániť vaše údaje?

Začnite anonymizovať PII s 285+ typmi entít v 48 jazykoch.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.