Denmark's Datatilsynet issued 31 GDPR enforcement decisions in 2024, with 14 specifically involving healthcare data systems — a concentration reflecting the high stakes of Denmark's comprehensive national health data infrastructure and the technical failures that repeatedly expose patient data.
CPR-Number: The Modulus-11 Requirement
The CPR number (Det Centrale Personregister-nummer) — 10 digits, format DDMMYY-XXXX — encodes birth date (digits 1-6) and a sequential number with check digit (digits 7-10). The final digit is validated using modulus-11 arithmetic:
Modulus-11 check: multiply digits 1-9 by weights (4,3,2,7,6,5,4,3,2), sum, take modulo 11. If result is 0, check digit = 0. If result is 1, the CPR is invalid (no valid check digit exists for this prefix). Otherwise, check digit = 11 minus result.
This creates the important property that some DDMMYY-XXXX patterns can never be valid CPR numbers (those where the modulo-11 calculation produces 1). Tools that pattern-match 10-digit numbers formatted as DDMMYY-XXXX without modulus-11 validation generate false positives from date strings, reference numbers, and invoice codes.
67% of generic NLP tools lack CPR modulus-11 implementation (Datatilsynet 2024). This detection failure is the single most cited technical inadequacy in Datatilsynet's healthcare enforcement actions.
Denmark's Health Data Research Ecosystem
Denmark's health registers — among the most complete longitudinal health datasets in the world — are linked through the CPR number. The CPR enables researchers to link:
- Hospital discharge records (from 1977)
- Prescription database (from 1995)
- Cancer registry (from 1943)
- Cause of death registry (from 1970)
- Primary care diagnosis data (from 1990)
This linkability makes Danish health research world-class but creates a re-identification risk that Datatilsynet takes seriously: even "de-identified" datasets that retain CPR-linked attributes (age, sex, diagnosis, year) can be re-identified in combination with other datasets.
Datatilsynet's 2024 guidance on secondary health data use requires that organizations using these registers demonstrate:
Technical anonymization documentation: Not a policy statement, but technical documentation showing exactly which identifiers were removed, which quasi-identifiers were generalized, and what k-anonymity level was achieved in the output dataset.
Third-party validation for research datasets: For research datasets with more than 5,000 individuals, Datatilsynet recommends independent technical review of anonymization procedures.
Data minimization: Research dataset scope must match the documented research question. Datatilsynet has found multiple cases where researchers used complete national registers when a random sample or geographically limited dataset would have served the research purpose.
Specific Healthcare Enforcement Findings
Datatilsynet's 14 healthcare enforcement decisions in 2024 document recurring technical failures:
Case pattern 1: Hospital shares de-identified patient dataset with academic research partner for AI training. Dataset contains CPR birth date components, diagnosis codes, and treatment dates. Datatilsynet finds the combination enables re-identification of rare disease patients (small denominator problem — unusual diagnoses narrow identification significantly).
Case pattern 2: Health tech startup processes Danish patient data through US-based AI API for clinical documentation support. CPR numbers in medical notes are transmitted to US servers without adequate transfer mechanism and without prior CPR detection and removal.
Case pattern 3: Insurance company processes medical certificate data for disability claims. CPR numbers in scanned PDF certificates are not detected by the company's OCR-plus-extraction pipeline (OCR converts image to text; text is processed but without CPR validation, many CPR numbers are missed in the OCR output due to formatting artifacts).
The OCR-plus-extraction failure mode is particularly common in healthcare contexts where documents are received as scanned images. CPR detection must work on OCR-processed text, which often introduces formatting inconsistencies (spaces inserted mid-number, dash position errors) that break simple pattern matching.
For Danish healthcare GDPR compliance: CPR detection with modulus-11 validation in both clean text and OCR-processed output, Danish-language NER (spaCy da_core_news), and technical anonymization documentation meeting Datatilsynet's 2024 secondary use standards are the minimum requirements.
Sources: