Multi-Framework Presets: GDPR + HIPAA + CCPA sa Iisang Data Pipeline
Ang Hamon: Multi-Jurisdiction Compliance
Ang modern organizations ay may customers sa maraming bansa:
- EU customers → GDPR requirements
- US healthcare → HIPAA requirements
- California residents → CCPA/CPRA requirements
- Canada → PIPEDA requirements
Kung ang organization ay gumagamit ng single data pipeline, kailangan mag-apply ng different anonymization rules based sa jurisdiction.
Ang Problema: Blanket vs Location-Specific Anonymization
❌ Mistake 1: One-Size-Fits-All Approach
# Wrong: Apply same anonymization to all customers
def anonymize_customer(customer_data):
return {
'name': hash_value(customer_data['name']),
'email': hash_value(customer_data['email']),
'phone': customer_data['phone'][-4:] # Last 4 digits
}
# Applya sa lahat: EU, US, California
for customer in customers:
anonymized = anonymize_customer(customer)
process(anonymized)
Problem:
- EU (GDPR) wants irreversible anonymization → hashing OK
- US (HIPAA) allows pseudonymization → hashing with reversibility OK
- California (CCPA) allows but doesn't prefer → hashing OK
Pero GDPR + HIPAA + CCPA have different requirements for data deletion, user access, etc.
✅ Solution: Location-Aware Multi-Framework Presets
Hakbang 1: Define Multi-Framework Presets
{
"presets": [
{
"preset_id": "GDPR_EU_2025",
"jurisdiction": "EU",
"framework": "GDPR",
"applies_to": ["country_code: DE,FR,IT,ES,etc"],
"rules": [
{
"entity_type": "PERSON",
"operator": "replace",
"new_value": "<PERSON>",
"irreversible": true
},
{
"entity_type": "EMAIL_ADDRESS",
"operator": "hash",
"algorithm": "SHA-256",
"irreversible": true
}
]
},
{
"preset_id": "HIPAA_US_2025",
"jurisdiction": "US (Healthcare)",
"framework": "HIPAA",
"applies_to": ["Healthcare providers in US"],
"rules": [
{
"entity_type": "PERSON",
"operator": "replace",
"new_value": "<PERSON>",
"allow_reversal": false // Irreversible is OK sa HIPAA
},
{
"entity_type": "PATIENT_ID",
"operator": "hash",
"algorithm": "HMAC-SHA256",
"secret_key": "organization_key" // Deterministic para sa linkage
}
]
},
{
"preset_id": "CCPA_CALIFORNIA_2025",
"jurisdiction": "California",
"framework": "CCPA/CPRA",
"applies_to": ["California residents"],
"rules": [
{
"entity_type": "PERSON",
"operator": "hash",
"algorithm": "SHA-256",
"irreversible": true
},
{
"entity_type": "EMAIL_ADDRESS",
"operator": "hash",
"algorithm": "SHA-256",
"irreversible": true
}
],
"note": "CCPA allows hashing, but user has right to access"
}
]
}
Hakbang 2: Implement Location-Aware Pipeline
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
def anonymize_by_jurisdiction(customer_data, jurisdiction):
"""Apply correct preset based on jurisdiction"""
# Determine preset
if jurisdiction in ["DE", "FR", "IT", "ES"]:
preset = load_preset("GDPR_EU_2025")
elif jurisdiction == "US" and customer_data.get("type") == "healthcare":
preset = load_preset("HIPAA_US_2025")
elif jurisdiction == "CA":
preset = load_preset("CCPA_CALIFORNIA_2025")
else:
preset = load_preset("DEFAULT")
# Analyze PII
text = serialize_customer_data(customer_data)
results = analyzer.analyze(text=text)
# Anonymize using jurisdiction-specific preset
anonymized = anonymizer.anonymize(
text=text,
analyzer_results=results,
operators=preset.to_operators()
)
# Log jurisdiction + preset used (for audit)
log_anonymization(
customer_id=customer_data["id"],
jurisdiction=jurisdiction,
preset_used=preset.preset_id,
timestamp=datetime.now()
)
return anonymized
# Usage
customers = [
{"id": "CUST-001", "name": "John Smith", "email": "john@example.com", "jurisdiction": "DE"},
{"id": "CUST-002", "name": "Jane Doe", "email": "jane@example.com", "jurisdiction": "US", "type": "healthcare"},
{"id": "CUST-003", "name": "Bob Johnson", "email": "bob@example.com", "jurisdiction": "CA"},
]
for customer in customers:
anonymized = anonymize_by_jurisdiction(customer, customer["jurisdiction"])
process(anonymized)
Hakbang 3: Audit Trail per Framework
{
"audit_log": [
{
"timestamp": "2025-03-08T10:15:00Z",
"customer_id": "CUST-001",
"jurisdiction": "DE",
"preset_used": "GDPR_EU_2025",
"entities_detected": ["PERSON", "EMAIL_ADDRESS"],
"entities_anonymized": ["PERSON", "EMAIL_ADDRESS"],
"framework": "GDPR"
},
{
"timestamp": "2025-03-08T10:15:01Z",
"customer_id": "CUST-002",
"jurisdiction": "US",
"preset_used": "HIPAA_US_2025",
"entities_detected": ["PERSON", "PATIENT_ID"],
"entities_anonymized": ["PERSON", "PATIENT_ID"],
"framework": "HIPAA"
}
]
}
Ang Compliance Matrix per Framework
| Framework | Anonymization | Reversibility | Right to Access | Right to Delete |
|---|---|---|---|---|
| GDPR | Irreversible | NOT allowed | On original, not anonymized | Article 17 |
| HIPAA | Irreversible or Pseudonymization | Allowed with secret key | On original | Allowed |
| CCPA | Any method | Not required to support | On original | Yes, within 45 days |
Ang Real-World Case: Multi-National SaaS
Isang SaaS company ay may customers sa EU, US, at California:
Data pipeline:
[Customer signs up] → Jurisdiction detection → Location-aware anonymization
↓
EU customer (DE) → GDPR preset (irreversible hash)
US customer (healthcare) → HIPAA preset (reversible pseudonymization)
CA customer → CCPA preset (irreversible hash)
↓
Audit log: 3 different presets used, all compliant
Compliance status:
- ✅ GDPR compliant (DE customer)
- ✅ HIPAA compliant (US healthcare customer)
- ✅ CCPA compliant (CA customer)
DPA audits:
- GDPR inspector: "Show the GDPR_EU_2025 preset and audit trail" ✅
- OCR auditor: "Show the HIPAA_US_2025 preset and reversibility" ✅
- AG auditor: "Show the CCPA_CALIFORNIA_2025 preset and user rights" ✅
Ang Best Practice
- Define presets per framework - GDPR, HIPAA, CCPA, etc.
- Implement location detection - determine jurisdiction from customer data
- Apply correct preset - based on jurisdiction
- Audit per framework - log which preset was used
- Update regularly - as regulations change
Ang multi-framework presets ay essential para sa organizations operating globally.