Preset Multi-Kerangka: Menganonimkan GDPR, HIPAA, dan CCPA Serentak
Organisasi penjagaan kesihatan global mungkin tertakluk kepada:
- GDPR (UE): Melindungi data warga EU
- HIPAA (AS): Melindungi maklumat kesihatan yang dilindungi
- CCPA (California): Melindungi privasi pengguna California
Setiap peraturan mempunyai definisi PII yang sedikit berbeza, keharusan penganoniman, dan senarai "pengecam quasi" yang berbeza.
GDPR Recital 26: Anonymized data is data which does not relate to an identified or identifiable person. HIPAA Safe Harbor: Remove 18 specific identifiers (nama, SSN, tarikh lahir, dll). CCPA § 1798.100: Consumers have the right to deletion of personal information.
Perbezaan Dalam Takrifan PII
| Aset | GDPR | HIPAA | CCPA |
|---|---|---|---|
| Nama Penuh | Memerlukan | Memerlukan | Memerlukan |
| Tarikh Lahir | Memerlukan | Memerlukan | Pilihan |
| Memerlukan | Pilihan | Memerlukan | |
| Nombor Rekod Perubatan | N/A | Memerlukan | N/A |
| Alamat | Memerlukan | Memerlukan | Memerlukan |
| Nombor Keselamatan Sosial | Memerlukan | Memerlukan | Memerlukan |
| Cookie / ID Perangkat | Memerlukan | Tidak | Memerlukan |
Membina Preset Multi-Kerangka
Langkah 1: Tentukan Entiti per Kerangka
FRAMEWORK_ENTITIES = {
"GDPR": {
"required": [
"PERSON",
"EMAIL_ADDRESS",
"PHONE_NUMBER",
"ADDRESS",
"SSN",
"CREDIT_CARD",
"DATE_OF_BIRTH",
"IP_ADDRESS"
],
"quasi_identifiers": ["COMPANY_NAME", "JOB_TITLE", "LOCATION"]
},
"HIPAA": {
"required": [
"PERSON",
"SSN",
"MEDICAL_RECORD_NUMBER",
"DATE_OF_BIRTH",
"PHONE_NUMBER",
"FAX_NUMBER",
"EMAIL_ADDRESS",
"ADDRESS",
"BIOMETRIC_ID"
],
"quasi_identifiers": ["FACILITY_NAME", "PROVIDER_ID", "HEALTH_PLAN_ID"]
},
"CCPA": {
"required": [
"PERSON",
"EMAIL_ADDRESS",
"ADDRESS",
"PHONE_NUMBER",
"COOKIES",
"IP_ADDRESS",
"DEVICE_ID",
"BIOMETRIC_DATA"
],
"quasi_identifiers": ["PURCHASE_HISTORY", "BROWSING_HISTORY", "LOCATION"]
}
}
Langkah 2: Tentukan Operator Kerangka
FRAMEWORK_OPERATORS = {
"GDPR": {
# GDPR memerlukan penganoniman sejati (tidak dapat diubah kembali)
"PERSON": OperatorConfig("redact"), # Buang sepenuhnya
"EMAIL_ADDRESS": OperatorConfig("redact"),
"SSN": OperatorConfig("redact"),
# Pengecam kuasi boleh diganti (pseudonimisasi)
"COMPANY_NAME": OperatorConfig("replace", params={"new_value": "[COMPANY]"}),
"JOB_TITLE": OperatorConfig("replace", params={"new_value": "[JOB]"})
},
"HIPAA": {
# HIPAA Safe Harbor: buang 18 pengecam
"PERSON": OperatorConfig("redact"),
"MEDICAL_RECORD_NUMBER": OperatorConfig("redact"),
"DATE_OF_BIRTH": OperatorConfig("redact"),
"PHONE_NUMBER": OperatorConfig("redact"),
# Pengecam kuasi boleh disimpan atau diganti
"FACILITY_NAME": OperatorConfig("replace", params={"new_value": "[FACILITY]"})
},
"CCPA": {
# CCPA: pengguna berhak untuk pemadaman
"PERSON": OperatorConfig("redact"),
"EMAIL_ADDRESS": OperatorConfig("redact"),
"COOKIES": OperatorConfig("redact"),
"DEVICE_ID": OperatorConfig("redact"),
"BROWSING_HISTORY": OperatorConfig("redact")
}
}
Langkah 3: Buat Preset Gabungan
def create_multi_framework_preset(document, frameworks=["GDPR", "HIPAA", "CCPA"]):
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
results = analyzer.analyze(
text=document["content"],
language="en",
entities=list(set(
item for framework in frameworks
for item in (FRAMEWORK_ENTITIES[framework]["required"] +
FRAMEWORK_ENTITIES[framework]["quasi_identifiers"])
))
)
# Untuk setiap kerangka, terapkan operator khususnya
anonymized_versions = {}
for framework in frameworks:
framework_operators = FRAMEWORK_OPERATORS[framework]
anonymized = anonymizer.anonymize(
text=document["content"],
analyzer_results=results,
operators={
entity.entity_type: framework_operators.get(
entity.entity_type,
OperatorConfig("replace", params={"new_value": f"[{entity.entity_type}]"})
)
for entity in results
}
)
anonymized_versions[framework] = anonymized
# Keluarkan versi paling ketat (GDPR redacts semua)
return anonymized_versions["GDPR"]
Versi Peraturan yang Berbeza untuk Tujuan Berbeza
document = {
"content": "Pesakit John Smith (DOB 1980-05-15, SSN 123-45-6789, MRN 5731) dirawat di Rumah Sakit Cleveland Clinic."
}
versions = create_multi_framework_preset(document)
print(f"GDPR: {versions['GDPR']}")
# GDPR: Pesakit [REDACTED] (DOB [REDACTED], SSN [REDACTED], MRN [REDACTED]) dirawat di [FACILITY].
print(f"HIPAA: {versions['HIPAA']}")
# HIPAA: Pesakit [REDACTED] (DOB [REDACTED], SSN [REDACTED], MRN [REDACTED]) dirawat di [FACILITY].
print(f"CCPA: {versions['CCPA']}")
# CCPA: Pesakit [REDACTED] (DOB [REDACTED], SSN [REDACTED], MRN [REDACTED]) dirawat di [FACILITY].
Kesimpulan: Preset multi-kerangka memastikan dokumen memenuhi kepatuhan semua peraturan tanpa meninggalkan PII di bawah standard apa pun.