By · Last updated 2026-05-29

Späť na blogTechnické

GDPR pipeline: anonymizujte PII pred ulozenim

Oznacovanie stlpcov v dbt nie je sulade s GDPR. Surove zakaznickeudate dorazia do vasho skladu Snowflake nemaskovane este pred tym, ako sa uplatnia zasady zalozene na znackach.

May 29, 20268 min čítania
data pipelinedbtSnowflakedata warehouseELT anonymizationGDPR engineering

GDPR bezpecny pipeline: Anonymizacia PII pred ulozenim

Aktualizovane pre rok 2026

Oznacili ste stlpce PII v dbt. Nastavili ste dynamicke maskovanie v Snowflake. Citite sa v sulade s GDPR.

Vas zdrojovy obsah stale pristal v sklade nemaskovany. Maskovanie bezi v case dotazu. Nemaskovany obsah sedi vo vasej surovej scheme. Ktokolzek s pristupom k surovej scheme ho moze citat. Vase modely dbt bexali pred existenciou zasad maskovania. Stare ingested tabulky nikdy neboli maskovane.

Medzera medzi tvrdeniam "mame zasady maskovania" a "nas pipeline je bezpecny" je tam, kde dochadza k porusovaniu GDPR.

Pozrite si nas prehlad sulade s predpismi pre informacie o tom, ako anonym.legal podporuje GDPR.

Ako ELT pipeline odhalia PII

Vzor Extract-Load-Transform (ELT) je teraz norma. Najprv nacita zdrojove udaje do skladu. Transformacie prichdzu neskor. Kroky vyzaraju takto:

  1. Extract: Zdrojove systemy exportuju vsetky polia. Salesforce CRM, Stripe platby, Intercom support - vsetko vychodzi.
  2. Load: Zdrojove udaje pristanu v ingestion scheme skladu. Snowflake, BigQuery, Redshift funguju rovnakym sposobom. Kazde pole PII je zahrnuté.
  3. Transform: Modely dbt cistia a spajaju udaje pre analyzu.

Ingestion vrstva drzi uplne osobne informacie. Mena, emailove adresy, telefonne cisla, platobne udaje, text supportovych listkov. V mnohych timoch maju inzinieri a analytici pristup k surovej scheme. Tieto tabulky mozu dotazovat kedykolvek.

Maskovanie zalozene na znackach v Snowflake pomaha v case dotazu. Ale len pre spravne nastavene downstream modely. Nemaskovava stare ingested tabulky. Neblokuje priame dotazy na scheme. Kazdy model a dashboard musi byt oznaceny. Tato zataz rastie so schemou.

Anonymizovanie pred nahranim

Anonymizacia PII na urovni pipeline odstranujet riziko na surovej vrstve. Urobte to pred tým, ako obsah pristal v sklade.

Pristup ETL (anonymizacia pred nahratim):

  1. Extrahovanie zo zdrojovych systemov
  2. Spustenie cez krok anonymizacie
  3. Nacitanie cisteho vystupu do skladu

Sklad nikdy neprijme nemaskovane PII. Ingestion schema drzi len cisty obsah. Downstream modely, dashboardy a priame dotazy funguju s cistym vystupom.

Mate dve hlavne cesty.

Moznost 1 - Integracioa s API:

Pre systemy s webhooki alebo streamovanymi exportmi smerujte zaznamy cez API anonym.legal najprv. Supportove listky opustajuce Intercom prechádzaju cez API pred skladom. Exporty Stripe robia to iste.

POST /api/anonymize
{
  "text": "Zakaznik Jan Novak (jan@priklad.sk) nahlasil...",
  "entities": ["PERSON", "EMAIL_ADDRESS", "PHONE_NUMBER"],
  "method": "replace"
}

Moznost 2 - Davkove predspracovanie:

Pre denné alebo tyzdenné exporty suborov CSV/JSON spustite subory cez davkove spracovanie pred nahratim.

Struktura Airflow DAG:

extract_task >> anonymize_batch_task >> load_to_warehouse_task

Uloha anonymize nahraje subory a dostane spat ciste verzie. Uloha load sa postara o zvysok.

Pozrite si nasu stranku bezpecnostnych praktik pre podrobnosti o sub-procesoroch a tokoch udajov.

Co oznacky stlpcov dbt robia a co nie

dbt vam umoznuje oznacovat stlpce PII:

models:
  - name: stg_customers
    columns:
      - name: email
        tags: ['pii', 'email']
      - name: full_name
        tags: ['pii', 'personal_data']

Znacky vam umoznuju:

  • Dokumentovat, kde su PII
  • Spustit downstream zasady maskovania (vyzaduje nastavenie na urovni skladu)
  • Sledovat linajku s nastrojmi ako Secoda

Znacky nerobia:

  • Maskovanie ingested tabuliek v surovej scheme
  • Blokovanie priamych dotazov na tabulky
  • Anonymizaciu udajov v case nahrania
  • Retroaktivne maskovanie stared udajov

Oznacky stlpcov dbt su nastrojaom spravovanie. Ukazuju vam, kde su PII. Neuplatňuju "primerané technicke opatrenia", ktore vyzaduje GDPR clanok 32.

Medzera maskovania Snowflake

Dynamicke maskovanie Snowflake skryje obsah stlpcov pred pouzivatelmi v case dotazu. Je to silna kontrola pre produktivné pouzitie. Ma vsak jasne limity.

Klucove limity:

  • Kazdy novy stlpec potrebuje explicitnu zasadu
  • Zmeny schemy mozu zanechat nove stlpce nemaskovane, kym neaktualizujete zasady
  • Role SYSADMIN a ACCOUNTADMIN mozu obist maskovanie
  • Importové úlohy casto bexia s vysokymi privilegiami, ktore preskocuju maskovanie
  • Stare udaje nacitane pred nastavenim zasad su ulozene v plaintext forme - zasady bexia v case citania, nie pisania

Maskovanie v case dotazu nestaci. Udaje musia byt ciste pred ich ulozenim.

Dokumentacia sulade s predpismi

Pravidlo zodpovednosti GDPR vyzaduje dokaz. Slova nestacia. Pre inzinierske timy to znamena pisomné zaznamy.

Zaznamy o cinnostiach spracovania (ROPA): Dokumentujte, ze informacie o zakaznikoch su anonymizovane pred nahratim do analytického skladu. Krok anonymizacie je cinnostou spracovania podla GDPR.

Poznamky k technickym opatreniam: Zapieste, ktore typy entit vasa pipeline ciela. Zaznacte pouzitu metodu anonymizacie. Logy davkovych spusteni vam to poskytu zadarmo.

Linajka udajov: Secoda alebo vstavana linajka dbt moze preukázat, ze zdrojove tabulky tocku cez krok anonymizacie pred dosiahnuti analytickych modelov. Toto je vasa auditna stopa.

Register dodavatelov: Sluzba anonymizacie je sub-procesor. Ich DPA a zasady ochrany sukromia musia byt vo vašom registri dodavatelov.

Kroky implementacie

Pre pipeline dbt a Snowflake:

Krok 1: Auditujte svoju surovu vrstvu

Zistite, ktore tabulky drzia osobne informacie. Dotazujte sa na znacky stlpcov dbt alebo vas katalog pre tabulky oznacené PII.

Krok 2: Nastavte rozsah anonymizacie

Pre kazdu zdrojovu tabulku rozhodnite, ktore stlpce drzia PII. Potom rozhodnite, ktore potrebuju anonymizaciu a ktore pseudonymizaciu. Text tela supportoveho listku: anonymizovat. ID objednavky: pseudonymizovat na zachovanie joinovacich klucov. Casova znacka: ponechat pre casove analyzy.

Krok 3: Vyberte cestu implementacie

Maly tim s davkovymi exportmi: pouzite davkové spracovanie suborov pred nahratim. Dostupny inziniersky tim: postavte integraciu API v Airflow alebo Prefect.

Krok 4: Testovanie a validacia

Spustite anonymizaciu na vzorke pred spustenim v produkcii. Skontrolujte, ze modely dbt stale funguju. Niektore modely sa joinuju na email. Tie potrebuju konzistentné nahradne hodnoty. Pseudonymizacia zachovava joinovacie kluce. Redakcia ich porusi.

Krok 5: Vyriesenie starych surovych tabuliek

Obsah nacitany pred zavedenim anonymizacie potrebuje retroaktivne spracovanie. Exportujte, anonymizujte, znova nahrajte. Toto je jednorazova uloha na tabulku.

Zaver

Maskovanie zalozene na znackach ukazuje, kde su PII. Nezbrani pouzivatelom s pristupom ku scheme ich citat. Pre skutocny sulade s GDPR musi byt PII ciste este pred dosiahnuti skladu. To robi ingestion vrstvu rovnako bezpecnou ako produkciu vrstvu.

Toto je tazsia nez oznacovanie stlpcov. Ale to je to, co skutocne znamena "primerané technicke opatrenia".

Zdroje

Pripravení chrániť vaše údaje?

Začnite anonymizovať PII s 285+ typmi entít v 48 jazykoch.

About this page

We update this page when our platform or the law changes.

Read our founder note for how we work.

Each change shows up in the timestamp at the top.

Related reading

We follow these rules

  • GDPR (EU 2016/679).
  • ISO/IEC 27001:2022.
  • NIS2 (EU 2022/2555).
  • HIPAA safe harbor under 45 CFR § 164.514(b)(2).

Our promise

We do not sell your data.

We do not train models on your text.

We store your files in Germany.

You can delete your account at any time.

You own your work.

Where we run

Our servers live in Falkenstein, Germany.

We use Hetzner. They hold ISO 27001 certification.

All data stays in the EU.

Backups run every day.

Need help?

Email support@anonym.legal.

We reply within one business day.

How we test

We run a full check suite on every release.

Each surface gets its own sweep script and report.

Human reviewers spot-check the output each week.

We track recall and precision on a labelled set.

Bad runs block the deploy.

What we never do

  • We never sell your information to third parties.
  • We never train models on what you upload.
  • We never keep your work after you delete it.
  • We never share keys with any outside firm.
  • We never run ads inside the product.

Plans in plain words

We sell credits, not seats.

One credit covers one short job.

Long jobs use a few credits each.

You can top up at any time.

Unused credits roll over each month.

Read the plans page for current rates.

Who built this

A small team of engineers and lawyers built this.

We ship from Europe and work in the open.

Our founder note spells out why we started.

Where to start

How the parts fit

A browser add-on cleans text inside Chrome.

A Word plug-in handles drafts in Office.

A small desktop tool works on whole folders.

An agent protocol link feeds large models safely.

All four share one core engine and one rule set.

Words from our team

We started this work after a lunch about cookies.

One friend kept getting odd ads on her phone.

We asked why a court file leaked through a draft.

We sketched the first build on a napkin that week.

By month three we had a tiny demo for a friend.

She used it on her first case the next day.

Common questions we hear

Can the tool read scanned PDFs? Yes, with OCR.

Does it work on long files? Yes, in small chunks.

Can I roll my own rule set? Yes, save it as a preset.

Does it run offline? The desktop build runs offline.

Do you keep my files? No, the cloud build wipes after each run.

Will it learn from my work? No, we never train on inputs.

A short tour of the workflow

Upload a file or paste a snippet of prose.

Pick the entities you want gone from the draft.

Choose a method: replace, mask, hash, encrypt, or redact.

Press run and watch the side panel show each hit.

Skim the result and tweak any rule that misfired.

Save the cleaned file or send it to a teammate.