返回博客GDPR 与合规

CNIL France: What Europe's Most Technically Demanding DPA Requires from PII Tools

CNIL processed 16,433 complaints in 2023 (+43%). 63% of CNIL notices cite inadequate AI anonymization. NIR/French SSN missed by 78% of generic tools. CNIL's 6-category anonymization guide requirements.

March 7, 20269 分钟阅读
France CNILNIR French SSNGDPR anonymizationFrench data protectionAI training data

France's Commission Nationale de l'Informatique et des Libertés (CNIL) is the EU's most technically demanding data protection authority. While other DPAs focus primarily on procedural compliance, the CNIL publishes detailed technical guidance — "recommandations" — that set specific algorithmic standards for anonymization, pseudonymization, and AI data governance. 63% of CNIL formal notices in 2024 cited inadequate anonymization in AI systems.

CNIL's Technical Influence Beyond France

The CNIL's technical guidance is routinely cited by other EU DPAs:

Guide pratique de l'anonymisation (2023): CNIL's practical anonymization guide covers k-anonymity, l-diversity, differential privacy, and their practical application to French datasets. 12+ EU DPAs reference this guide in their own enforcement guidance (including IMY Sweden, which produced its own version based in part on CNIL methodology).

AI systems guidance (2024): CNIL's AI governance guidance covers 6 mandatory anonymization categories for AI training data — the most specific EU DPA guidance on this topic.

Cookie technical requirements: CNIL's cookie enforcement guidance (regularly updated) requires specific technical implementations for consent management platforms — the most technically specific DPA guidance on consent technology in the EU.

The NIR: France's Most Sensitive Identifier

The Numéro d'Inscription au Répertoire (NIR) — also called the numéro de sécurité sociale — is a 15-digit French social security number in the format:

S AAMMDDCCC OOO K

Where:

  • S = 1 digit: sex (1=male, 2=female)
  • AA = 2 digits: year of birth
  • MM = 2 digits: month of birth
  • DD = 2 digits: department of birth (01-95, 2A/2B for Corsica, 97-99 for overseas territories, 99 for foreign birth)
  • CCC = 3 digits: municipality code within department
  • OOO = 3 digits: birth order number
  • K = 2 digits: check key (97 - (NIR mod 97))

The NIR encodes sex, birth date, birth location, and birth order — making it among the most information-rich national identifiers in the EU. CNIL classifies NIR as requiring heightened protection equivalent to special category data.

Detection challenge: Generic NLP tools miss NIR in 78% of documents according to CNIL's 2024 analysis. The specific failures:

  • NIR's 15-digit structure (without separators in many documents) is confused with other long number sequences
  • The department/municipality encoding (digits 7-11) requires geographic knowledge to validate — tools that don't implement the mod-97 key calculation cannot distinguish valid NIR numbers from false positives
  • Corsican departments (2A/2B — letters, not digits) break pattern-matching tools that expect only numeric characters

SIREN/SIRET: Business Identifiers in French Documents

SIREN number: 9-digit French company identification number with Luhn check digit. Appears in all French commercial documents.

SIRET number: 14-digit extension of SIREN (9-digit SIREN + 5-digit establishment number). The SIRET uniquely identifies a specific business establishment, while SIREN identifies the company entity.

Business documents frequently contain SIRET numbers alongside personal data of company representatives — CNIL's enforcement guidance treats the combination of SIRET + individual name as creating identifiable information that triggers GDPR obligations.

CNIL's AI Anonymization Requirements

CNIL's 2024 AI guidance requires 6 specific anonymization categories for AI training data involving French personal data:

  1. Identifier removal: Explicit identifiers (name, NIR, SIREN) must be replaced with pseudonyms or removed
  2. Quasi-identifier generalization: Attributes that could enable re-identification in combination (age, department, profession) must be generalized to reduce specificity
  3. Noise addition: Numerical attributes must have calibrated noise added to prevent inference
  4. k-anonymity verification: Every individual in the dataset must be indistinguishable from at least k-1 others (CNIL recommends k≥5)
  5. l-diversity verification: Sensitive attribute values must have adequate diversity within each equivalence class
  6. Re-identification risk assessment: Before publication, datasets must undergo re-identification risk assessment using documented methodology

CNIL has explicitly found that simply removing the NIR and full name from a dataset is not sufficient anonymization. Additional quasi-identifiers (age, ZIP code, profession, medical specialty) must also be addressed.

Bilingual French/Regional Language Context

France has a complex linguistic situation relevant to PII detection:

Metropolitan French: Standard French as spoken in France — primary language of all official documents.

DOM-TOM identifiers: Overseas territories (Martinique, Guadeloupe, Réunion, Guyane, Mayotte) have their own administrative codes in NIR numbers (97, 98 prefix for overseas departments) and local name conventions.

Alsatian context: The Alsace-Moselle region has historical German administrative conventions — German-origin names and some German administrative document formats appear in French administrative records.

Belgian French: For organizations operating across France and Belgium, French and Belgian identifier formats differ (NIR vs. Belgian national register number), and Belgian French uses slightly different name conventions.

For French compliance: NIR detection with mod-97 key validation, SIREN/SIRET detection with Luhn validation, French-language NER with accented character support (é, è, ê, ë, à, â, î, ô, û, ç, œ), and documented anonymization meeting CNIL's 6-category framework for AI training data.

Sources:

准备好保护您的数据了吗?

开始使用 285 种实体类型在 48 种语言中匿名化 PII。