Itzuli BlogeraGDPR & Betetze

HDPA Greece: AFM and AMKA Detection — Why Greek Identifiers Fail in 52% of Generic NLP Tools

Greek AFM detected with 52% accuracy by generic tools. HDPA issued 89 decisions in 2024 — up 162% from 2022. Tourism and maritime sectors face distinct compliance requirements. Greek alphabet NER requirements.

March 7, 20267 min irakurri
Greece HDPAAFM AMKA detectionGreek alphabet NERtourism GDPRGreek identifiers

Greece's Hellenic Data Protection Authority (HDPA) issued 89 enforcement decisions in 2024, a 162% increase from 34 decisions in 2022. The sharp enforcement acceleration reflects both growing HDPA capacity and sector-specific compliance failures in tourism — which accounts for 38% of HDPA cases — and maritime operations.

AFM: Greece's Primary Commercial Identifier

The ΑΦΜ (Αριθμός Φορολογικού Μητρώου, Tax Registration Number) is a 9-digit number assigned to all Greek citizens, residents, and businesses for tax administration. The check digit uses a weighted sum algorithm: multiply digits 1-8 by weights (256,128,64,32,16,8,4,2), sum, take modulo 11. If result = 10, the number is invalid. Otherwise check digit = result modulo 10.

AFM appears in all Greek commercial documents — invoices, contracts, employment agreements, and government forms. It is the primary commercial identifier for both individuals and businesses in Greece.

Detection accuracy: Generic NLP tools detect AFM with 52% accuracy (HDPA 2024 analysis). The failure modes:

  • AFM's 9-digit format matches many reference numbers and date components in Greek documents
  • The weighted modulo-11/modulo-10 two-step check digit is not commonly implemented in generic tools
  • Greek documents frequently present AFM without explicit label in context (embedded in address blocks, not labeled "ΑΦΜ:")

AMKA: Greece's Social Insurance Identifier

The ΑΜΚΑ (Αριθμός Μητρώου Κοινωνικής Ασφάλισης, Social Insurance Registry Number) is an 11-digit number encoding birth date and gender:

  • Digits 1-6: Birth date in DDMMYY format
  • Digit 7: Gender (odd = male, even = female)
  • Digits 8-11: Sequential number with check digit

The birth date + gender encoding makes AMKA structurally similar to Sweden's personnummer — and creates the same GDPR special category concern: the number reveals biological sex as a matter of record.

AMKA appears in all Greek healthcare documents, social security filings, and employer records. Every Greek citizen and legal resident has an AMKA, making it the equivalent of a social security number for healthcare and social benefit access.

Greek Alphabet: The NLP Infrastructure Challenge

Greek text uses the Greek alphabet — an entirely different writing system from Latin-script languages. This creates a fundamental infrastructure challenge for PII detection:

Unicode ranges: Greek characters occupy Unicode range U+0370 to U+03FF (Greek and Coptic block) and U+1F00 to U+1FFF (Greek Extended for polytonic forms). Tools that handle only ASCII or Latin Extended characters fail to process Greek text at all.

Greek NER models: spaCy's el_core_news model provides Greek NER capability — but requires explicit Greek language configuration. Organizations using default-language configurations (typically English) will receive no output for Greek-script documents.

Mixed-script documents: Greek business and government documents frequently mix Greek script (main content) with Latin script (brand names, technical terms, English annotations). NLP pipelines must handle both scripts in the same document.

Name recognition in Greek: Greek names appear in nominative case (Γεώργιος Παπαδόπουλος) but also in genitive/accusative forms in Greek sentences (Γεωργίου Παπαδόπουλου in genitive). Case-aware NER recognition requires Greek morphological analysis.

Tourism Sector: Seasonal Data Processing Compliance

Tourism accounts for 38% of HDPA enforcement cases. The compliance challenge is scale and seasonality:

Hotel PMS systems: Property management systems process complete guest information — passport numbers, nationality, birth dates, contact data — for all guests. HDPA enforcement found many hotel PMS systems retaining guest data for 5+ years without documented purpose and without security measures proportionate to the data volume.

IBAN and payment data: Greek tourism businesses process payment data from EU and international guests. Guest folios (hotel bills) contain partial card numbers; reservation systems contain full payment details with expiry dates. PCI DSS compliance overlaps with GDPR requirements for payment data.

Staff data turnover: Seasonal workers in hospitality typically complete contracts of 4-6 months. HDPA enforcement found repeated failures to revoke system access for departed seasonal staff — a pattern common to any industry with high employee turnover.

For HDPA compliance in Greek-language contexts: AFM and AMKA detection with checksum validation, Greek alphabet NER support (spaCy el_core_news), and Greek passport/national ID detection are the technical requirements. For tourism sector compliance specifically, hotel PMS data retention documentation and seasonal staff access revocation procedures are the additional organizational requirements that HDPA enforcement makes clear.

Sources:

Prest zure datuak babesteko?

Hasi PII anonimizatzen 285+ entitate mota 48 hizkuntzatan.