Bumalik sa BlogGDPR & Pagsunod

EU National Tax IDs: PII Detection para sa GDPR...

Ang national tax IDs ay ang pinaka-sensitive PII sa EU. Ang bawat bansa ay may sariling format (DNI sa Spain, PESEL sa Poland, CNP sa Romania).

April 20, 20267 min basahin
EU national identifiersSteueridentifikationsnummerCodice FiscaleNIFmultinational GDPRtax ID detection

EU National Tax IDs: PII Detection para sa GDPR Compliance

Bakit Ang National Tax IDs ay Kritikal sa EU

Ang national tax IDs (NTIDs) ay ang most sensitive PII sa buong EU:

  • Spain: DNI (Documento Nacional de Identidad) - 8 digits + 1 letter
  • Poland: PESEL - 11 digits (includes birth date)
  • Romania: CNP (Cod Numeric Personal) - 13 digits (includes birth date + gender)
  • France: NIR (Numéro d'Inscription au Répertoire) - 15 digits
  • Germany: Steuer-ID - 11 digits

Bakit kritikal? Ang national tax ID ay directly linked sa banking, healthcare, at social services. Kung ma-leak, ang pasyente ay pwedeng maging victim ng identity theft.

Ang GDPR Risk: Tax ID Leakage

Ang GDPR ay nag-classify ng national tax IDs bilang Article 4(1) personal data at kailangan ng special handling:

  • Article 9(1): Prohibited processing (unless explicit consent)
  • Article 32: Enhanced security measures required
  • Article 34: Mandatory breach notification sa DPA kung may leak

Maraming organizations ay nag-miss ng national tax ID detection dahil:

  1. Hindi standard format - bawat bansa ay iba
  2. Complex validation - maraming country-specific checksum algorithms
  3. No English regex - Filipino, Czech, Greek scripts ay kailangan

Ang Solution: EU Tax ID Recognizer

Ang Presidio Analyzer ay nag-support ng custom recognizers para sa lahat ng 27 EU national tax IDs:

Hakbang 1: Define ang EU Tax ID Patterns

{
  "tax_id_patterns": {
    "ES_DNI": {
      "regex": "\d{8}[A-Z]",
      "example": "12345678A",
      "validation": "checksum_dni",
      "risk_level": "CRITICAL"
    },
    "PL_PESEL": {
      "regex": "\d{11}",
      "example": "85010112345",
      "validation": "contains_birth_date",
      "risk_level": "CRITICAL"
    },
    "RO_CNP": {
      "regex": "\d{13}",
      "example": "1850101123456",
      "validation": "checksum_cnp + gender",
      "risk_level": "CRITICAL"
    },
    "FR_NIR": {
      "regex": "\d{15}",
      "example": "185010112345678",
      "validation": "contains_birth_date",
      "risk_level": "CRITICAL"
    }
  }
}

Hakbang 2: Implement ang Tax ID Recognizer

from presidio_analyzer import AnalyzerEngine, PatternRecognizer, Pattern

analyzer = AnalyzerEngine()

# Spanish DNI
dni_recognizer = PatternRecognizer(
    supported_entity="ES_DNI",
    patterns=[Pattern(name="dni", regex=r"\d{8}[A-Z]")],
    context=["DNI", "identificación"]
)

# Polish PESEL
pesel_recognizer = PatternRecognizer(
    supported_entity="PL_PESEL",
    patterns=[Pattern(name="pesel", regex=r"\d{11}")],
    context=["PESEL", "numer"]
)

analyzer.registry.add_recognizer(dni_recognizer)
analyzer.registry.add_recognizer(pesel_recognizer)

# Test
results = analyzer.analyze(
    text="Spanish DNI: 12345678A, Polish PESEL: 85010112345",
    entities=["ES_DNI", "PL_PESEL"]
)
print(results)
# [Entity(ES_DNI), Entity(PL_PESEL)]

Hakbang 3: Anonymize Tax IDs

from presidio_anonymizer import AnonymizerEngine

anonymizer = AnonymizerEngine()
text = "Patient with DNI 12345678A and PESEL 85010112345"

anonymized = anonymizer.anonymize(
    text=text,
    analyzer_results=results,
    operators={
        "ES_DNI": OperatorConfig("replace", {"new_value": "<DNI>"}),
        "PL_PESEL": OperatorConfig("replace", {"new_value": "<PESEL>"})
    }
)
print(anonymized)
# "Patient with DNI <DNI> and PESEL <PESEL>"

Ang EU Tax ID Coverage (27 Countries)

CountryIdentifierFormatChecksumDetection
SpainDNI8 digits + 1 letterYes (mod 23)
PolandPESEL11 digitsYes (mod 10)
RomaniaCNP13 digitsYes
FranceNIR15 digitsYes
GermanySteuer-ID11 digitsYes
ItalyCodice Fiscale16 alphanumericYes
GreeceAFM9 digitsYes
Czechia10 digitsYes
HungaryTAJ-szám9 digitsYes (mod 7)
Slovakia10 digitsYes
SloveniaEMŠO13 digitsYes (mod 11)
CroatiaOIB11 digitsYes (mod 11)
BulgariaEGN10 digitsYes (mod 11)
LithuaniaAsmens kodas11 digitsYes
LatviaPersonas kods11 digitsYes
EstoniaIsikukoodi11 digitsYes (mod 11)
+ 11 others.........

Ang Real-World Case: CNIL France Enforcement

Noong 2023, ang CNIL ay nag-fine ng isang healthcare provider €50,000 dahil sa:

  • ❌ National tax IDs (NIR) hindi na-anonymized sa shared documents
  • ❌ NIR merged sa patient names sa analytics warehouse
  • ❌ Walang audit trail kung sino ang nag-access

With EU tax ID recognizer:

Detection: 15-digit NIR patterns detected
Anonymization: All NIRs replaced with <NIR>
Audit log: 145 NIRs detected at anonymized
GDPR evidence: Comply with Article 32 + 34

Ang Benefits

100% EU coverage - lahat ng 27 national tax ID formats ✅ Validation - checksum verification reduce false positives ✅ Context awareness - detect language-specific tax IDs (DE, FR, ES, PL, RO, etc.) ✅ Compliance evidence - prove na lahat ng tax IDs ay anonymized

Ang Best Practice

  1. Enable EU Tax ID recognizers sa Presidio configuration
  2. Validate extensively laban sa real national IDs (test with fake data)
  3. Audit logs - track kung ilan ang detected per country
  4. Quarterly updates - add bagong countries habang lumalaki ang EU

Ang national tax ID detection ay mandatory para sa EU organizations na may GDPR compliance requirements.

Handa nang protektahan ang iyong data?

Simulan ang anonymization ng PII gamit ang 285+ uri ng entidad sa 48 wika.