Povratak na BlogGDPR & Usklađenost

Datatilsynet Denmark: CPR-Number Modulus-11 Validation and Danish Healthcare GDPR Requirements

67% of NLP tools miss Danish CPR-number modulus-11 validation. Datatilsynet's 14 healthcare enforcement actions in 2024. Secondary use of health data requires documented anonymization validation procedures.

March 7, 20267 min čitanja
Denmark DatatilsynetCPR modulus-11Danish healthcare GDPRhealth data anonymizationNordic compliance

Denmark's Datatilsynet issued 31 GDPR enforcement decisions in 2024, with 14 specifically involving healthcare data systems — a concentration reflecting the high stakes of Denmark's comprehensive national health data infrastructure and the technical failures that repeatedly expose patient data.

CPR-Number: The Modulus-11 Requirement

The CPR number (Det Centrale Personregister-nummer) — 10 digits, format DDMMYY-XXXX — encodes birth date (digits 1-6) and a sequential number with check digit (digits 7-10). The final digit is validated using modulus-11 arithmetic:

Modulus-11 check: multiply digits 1-9 by weights (4,3,2,7,6,5,4,3,2), sum, take modulo 11. If result is 0, check digit = 0. If result is 1, the CPR is invalid (no valid check digit exists for this prefix). Otherwise, check digit = 11 minus result.

This creates the important property that some DDMMYY-XXXX patterns can never be valid CPR numbers (those where the modulo-11 calculation produces 1). Tools that pattern-match 10-digit numbers formatted as DDMMYY-XXXX without modulus-11 validation generate false positives from date strings, reference numbers, and invoice codes.

67% of generic NLP tools lack CPR modulus-11 implementation (Datatilsynet 2024). This detection failure is the single most cited technical inadequacy in Datatilsynet's healthcare enforcement actions.

Denmark's Health Data Research Ecosystem

Denmark's health registers — among the most complete longitudinal health datasets in the world — are linked through the CPR number. The CPR enables researchers to link:

  • Hospital discharge records (from 1977)
  • Prescription database (from 1995)
  • Cancer registry (from 1943)
  • Cause of death registry (from 1970)
  • Primary care diagnosis data (from 1990)

This linkability makes Danish health research world-class but creates a re-identification risk that Datatilsynet takes seriously: even "de-identified" datasets that retain CPR-linked attributes (age, sex, diagnosis, year) can be re-identified in combination with other datasets.

Datatilsynet's 2024 guidance on secondary health data use requires that organizations using these registers demonstrate:

Technical anonymization documentation: Not a policy statement, but technical documentation showing exactly which identifiers were removed, which quasi-identifiers were generalized, and what k-anonymity level was achieved in the output dataset.

Third-party validation for research datasets: For research datasets with more than 5,000 individuals, Datatilsynet recommends independent technical review of anonymization procedures.

Data minimization: Research dataset scope must match the documented research question. Datatilsynet has found multiple cases where researchers used complete national registers when a random sample or geographically limited dataset would have served the research purpose.

Specific Healthcare Enforcement Findings

Datatilsynet's 14 healthcare enforcement decisions in 2024 document recurring technical failures:

Case pattern 1: Hospital shares de-identified patient dataset with academic research partner for AI training. Dataset contains CPR birth date components, diagnosis codes, and treatment dates. Datatilsynet finds the combination enables re-identification of rare disease patients (small denominator problem — unusual diagnoses narrow identification significantly).

Case pattern 2: Health tech startup processes Danish patient data through US-based AI API for clinical documentation support. CPR numbers in medical notes are transmitted to US servers without adequate transfer mechanism and without prior CPR detection and removal.

Case pattern 3: Insurance company processes medical certificate data for disability claims. CPR numbers in scanned PDF certificates are not detected by the company's OCR-plus-extraction pipeline (OCR converts image to text; text is processed but without CPR validation, many CPR numbers are missed in the OCR output due to formatting artifacts).

The OCR-plus-extraction failure mode is particularly common in healthcare contexts where documents are received as scanned images. CPR detection must work on OCR-processed text, which often introduces formatting inconsistencies (spaces inserted mid-number, dash position errors) that break simple pattern matching.

For Danish healthcare GDPR compliance: CPR detection with modulus-11 validation in both clean text and OCR-processed output, Danish-language NER (spaCy da_core_news), and technical anonymization documentation meeting Datatilsynet's 2024 secondary use standards are the minimum requirements.

Sources:

Spremni za zaštitu vaših podataka?

Započnite anonimizaciju PII-a s 285+ vrsta entiteta na 48 jezika.