Privacy & Technology Glossary
Definitions for all terms, acronyms, and concepts used in PII anonymization and data privacy.
Back to Docs94 terms
2FA
Two-Factor Authentication
Authentication requiring two distinct verification factors: something the user knows (password) and something the user has (TOTP app, hardware key) or is (biometric). Supported in anonym.legal as an additional layer on top of ZK Auth.
#twofaAES-256-GCM
Advanced Encryption Standard 256-bit Galois/Counter Mode
An authenticated encryption algorithm combining AES-256 (256-bit key) with Galois/Counter Mode for both confidentiality and integrity. Used in anonym.legal's reversible anonymization to encrypt replaced entities. Provides both secrecy and tamper detection.
#aes256gcmAnonymization
Data Anonymization
The irreversible process of removing or transforming identifying information so that individuals can no longer be identified, directly or indirectly. Under GDPR, truly anonymized data falls outside the regulation's scope.
#anonymizationArgon2id
Argon2id Key Derivation Function
The winner of the 2015 Password Hashing Competition. Argon2id combines the side-channel resistance of Argon2i and the GPU-resistance of Argon2d. Used in anonym.legal and the Desktop App vault for deriving encryption keys from user passphrases.
#argon2idAttorney-Client Privilege
Attorney-Client Privilege Protection
Legal protection for confidential communications between attorneys and clients. In document review and e-discovery, privileged content must be identified and withheld or redacted. Custom entity types in anonym.legal can be configured to flag privileged content markers.
#attorneyclientprivilegeAudit Trail
Immutable Audit Trail
A sequential, tamper-evident log recording who accessed, modified, or processed data and when. Required by ISO 27001 (A.8.15), HIPAA Security Rule (§164.312(b)), and e-discovery rules. anonym.legal logs all anonymization operations with timestamps, entity counts, and operator IDs.
#audittrailBatch Processing
Batch File Anonymization
Processing multiple files simultaneously in a single operation. anonym.legal's batch mode supports PDF, DOCX, and TXT files with per-file entity configuration, confidence thresholds, and output format selection.
#batchprocessingBIP39
Bitcoin Improvement Proposal 39 — Mnemonic Phrases
A standard for generating human-readable mnemonic seed phrases (12–24 words) from a cryptographic seed. Used in the anonym.legal Desktop App vault as a user-friendly backup for the Argon2id-derived encryption key.
#bip39CCPA
California Consumer Privacy Act
California privacy law granting residents the right to know, delete, and opt out of the sale of their personal information. Applies to businesses meeting revenue, data volume, or data-selling thresholds. Significantly amended by CPRA (passed November 2020, effective January 2023).
#ccpaChrome Extension
anonym.legal Chrome Extension
Browser extension (Manifest V3) that intercepts text before it is sent to AI chatbots (ChatGPT, Claude, Gemini, Perplexity, DeepSeek). Anonymizes on-the-fly and optionally decrypts AI responses using saved encryption keys.
#chromeextensionCLOUD Act
Clarifying Lawful Overseas Use of Data Act
US federal law (2018) allowing US law enforcement to compel US-based cloud providers to produce data stored abroad. Conflicts with GDPR data transfer rules for EU residents. Anonymizing data before cloud upload is a common mitigation.
#cloudactCode-Switching
Multilingual Code-Switching
The phenomenon of mixing two or more languages within a single text or conversation. Common in multilingual documents (e.g., German legal documents with English technical terms). anonym.legal's hybrid detection handles code-switched text by applying multiple language models simultaneously.
#codeswitchingConfidence Scoring
Entity Detection Confidence Score
A 0–1 score indicating how certain the detection model is that a text span is a PII entity. anonym.legal exposes configurable confidence thresholds so users can tune precision vs. recall trade-offs for their specific use case.
#confidencescoringCSP
Content Security Policy
An HTTP response header and meta tag mechanism that restricts which resources (scripts, styles, images) a browser can load. anonym.legal's CSP includes object-src 'none', script-src with nonces, and upgrade-insecure-requests to prevent XSS attacks.
#cspCustom Entities
Custom Entity Recognizers
User-defined PII patterns added on top of anonym.legal's built-in 285+ entity types. Supports regex patterns, word lists, and deny-lists. Useful for organization-specific identifiers such as employee IDs, internal project codes, or proprietary product names.
#customentitiesData Minimization
GDPR Data Minimization Principle
GDPR Article 5(1)(c) principle requiring that only data adequate, relevant, and necessary for the specified purpose is collected and processed. A core design constraint for privacy-compliant systems.
#dataminimizationData Residency
Data Residency Requirements
Legal or contractual requirements specifying which geographic location data must be stored and processed in. Relevant for GDPR (data transfers outside EEA), German BDSG, and sector-specific regulations in healthcare and finance.
#dataresidencyData Sovereignty
Digital Data Sovereignty
The principle that data is subject to the laws and governance structures of the nation in which it is collected. Broader than data residency, it encompasses control over who can access data and under what legal framework.
#datasovereigntyDDoS Protection
Distributed Denial of Service Protection
Infrastructure-level defenses against distributed denial-of-service attacks. anonym.legal's server infrastructure includes firewall rules (UFW), nginx connection limits, and Cloudflare-equivalent upstream protections to maintain availability.
#ddosprotectionDe-anonymization
Re-identification Attack
The process of re-identifying individuals from supposedly anonymized datasets by cross-referencing with auxiliary information. A key risk when sharing data with insufficient anonymization depth.
#deanonymizationDefensibility
Legally Defensible Anonymization
The ability to demonstrate to regulators, courts, or auditors that anonymization was performed using a documented, consistent, and technically sound methodology. anonym.legal's audit logs, confidence scores, and operator settings support defensible anonymization workflows.
#defensibilityDesktop App
anonym.legal Desktop Application
Cross-platform application (Windows, macOS, Linux) built with Tauri 2.0 and React 18. Features local file processing, BIP39 vault for offline ZK Auth, batch export, and API sync. Supports air-gapped deployments without internet access.
#desktopappDifferential Privacy
Differential Privacy (DP)
A mathematical framework for releasing statistical information about datasets while providing provable guarantees that any individual's data cannot be distinguished. Used in aggregate analytics to prevent re-identification even when querying aggregate outputs.
#differentialprivacyDigital Identifiers
Digital Identity Entity Types
Entity types for online and digital identifiers: EMAIL_ADDRESS, PHONE_NUMBER, IP_ADDRESS (IPv4 and IPv6), URL, DOMAIN_NAME, CRYPTO (Bitcoin/Ethereum addresses), and platform-specific identifiers.
#digitalidentifiersDLP
Data Loss Prevention
A security discipline and category of software tools that detect and prevent unauthorized transmission of sensitive data outside an organization. anonym.legal functions as a browser-layer and AI-layer DLP solution for PII.
#dlpDPA
Data Processing Agreement
A legally binding contract between a data controller and data processor, required by GDPR Article 28. Specifies the subject matter, duration, nature, purpose, and type of personal data processing, and the rights and obligations of both parties.
#dpaDPIA
Data Protection Impact Assessment
A risk assessment process required by GDPR Article 35 for processing activities likely to result in high risks to individuals' rights and freedoms. Mandatory for systematic profiling, large-scale PHI processing, and public area surveillance.
#dpiae-Discovery
Electronic Discovery
The process of identifying, collecting, and producing electronically stored information in legal proceedings. Requires redacting PII and privileged information from produced documents. A primary use case for legal departments using anonym.legal.
#ediscoveryE2EE
End-to-End Encryption
Encryption in which only communicating parties can read the messages; the service provider has no access to plaintext. In anonym.legal's ZK Auth mode, encryption keys never leave the client device, achieving E2EE for anonymized output storage.
#e2eeEntity Type
PII Entity Type
A category of personal information that the detection engine recognizes and can anonymize. Examples: PERSON, EMAIL_ADDRESS, PHONE_NUMBER, CREDIT_CARD, IBAN_CODE, US_SSN, IP_ADDRESS. anonym.legal supports 285+ entity types across 48 languages.
#entitytypeEU Data Residency
European Union Data Residency
The guarantee that data is stored and processed exclusively within EU/EEA territory. anonym.legal's production servers are in Germany (Hetzner Frankfurt), ensuring all processing occurs under GDPR jurisdiction without cross-border data transfer implications.
#eudataresidencyFinancial Entities
Financial PII Entity Types
Entity types covering financial identifiers: CREDIT_CARD (Luhn checksum), IBAN_CODE (ISO 13616 checksum), SWIFT_CODE (BIC format), US_BANK_NUMBER, NRP (Spanish tax ID). Detected with checksum validation to minimize false positives.
#financialentitiesFOIA
Freedom of Information Act
US federal law (and equivalent statutes in other jurisdictions) granting public access to government records. Requires redaction of PII and other exempt information before disclosure — a primary use case for legal and government anonymization workflows.
#foiaGDPR
General Data Protection Regulation
EU Regulation 2016/679, the primary data protection framework for the European Union. Applies to any organization processing personal data of EU residents. Fines up to €20M or 4% of global annual revenue. Key rights: access, erasure, portability, restriction, objection.
#gdprGDPR Article 25
GDPR Article 25 — Data Protection by Design and by Default
Requires controllers to implement appropriate technical and organizational measures (such as pseudonymization and data minimization) both at the time of system design and by default during processing.
#gdprarticle25GDPR Article 32
GDPR Article 32 — Security of Processing
Requires controllers and processors to implement appropriate technical and organizational measures to ensure a risk-appropriate security level, including encryption, pseudonymization, confidentiality, integrity, availability, and resilience of processing systems.
#gdprarticle32GenAI DLP
Generative AI Data Loss Prevention
A specialized DLP category focused on preventing PII and confidential data from being included in prompts sent to generative AI models (ChatGPT, Claude, Gemini). anonym.legal's Chrome Extension and MCP Server address this risk at the point of input.
#genaidlpGovernment ID
Government Identifier Entity Types
Entity types for national and government-issued identifiers: US_SSN, US_PASSPORT, UK_NHS, ES_NIF, DE_PERSONALAUSWEIS, FR_INSEE, IT_FISCAL_CODE, and 50+ other country-specific ID formats. Detected using country-specific regex + checksum patterns.
#governmentidHashing
Cryptographic Hashing
A one-way transformation of data into a fixed-length digest using algorithms such as SHA-256. Used for consistent pseudonymization, deduplication, and integrity verification. Hash values cannot be reversed but can be vulnerable to rainbow table attacks if not salted.
#hashingHealthcare Entities
Healthcare PII Entity Types
Entity types for the 18 HIPAA Safe Harbor identifiers and additional health-related PII: US_MRN (medical record numbers), MEDICAL_LICENSE, HEALTHCARE_PLAN_BENEFICIARY, and diagnosis/treatment context entities.
#healthcareentitiesHetzner
Hetzner Online GmbH
German cloud and hosting provider where anonym.legal's production infrastructure runs. Located in Falkenstein, Saxony (datacenter fsn1) with ISO 27001 certification. Chosen for EU data residency, compliance posture, and GDPR-friendly jurisdiction under German law.
#hetznerHIPAA
Health Insurance Portability and Accountability Act
US federal law establishing standards for protecting sensitive patient health information. The Privacy Rule governs PHI use; the Security Rule requires administrative, physical, and technical safeguards for electronic PHI (ePHI). Violations carry fines up to $1.9M per category per year.
#hipaaHIPAA Safe Harbor
HIPAA Safe Harbor De-identification Method
One of two HIPAA-approved de-identification methods requiring removal of all 18 specified patient identifiers (name, address, dates, phone numbers, SSN, email, IP address, biometrics, etc.) to render health data not individually identifiable.
#hipaasafeharborHSTS
HTTP Strict Transport Security
A web security policy mechanism that forces browsers to only use HTTPS connections. anonym.legal sets Strict-Transport-Security: max-age=31536000; includeSubDomains to prevent protocol downgrade attacks and cookie hijacking.
#hstsHybrid Detection
Hybrid NLP + Regex + ML Detection
anonym.legal's three-layer approach: regex patterns for structured PII (phone numbers, IBANs, credit cards), NLP/NER models for contextual entities (names, organizations, locations), and ML classifiers for ambiguous cases. Reduces both false positives and false negatives.
#hybriddetectionImage Redactor
Presidio Image Redactor Service
A specialized backend service (port 8013) that detects and redacts PII from image files (PNG, JPEG) using OCR and Presidio analysis. Applies black-bar redaction over detected PII regions in the original image.
#presidioimageredactorInsurance Identifiers
Insurance Entity Types
Entity types for insurance-related identifiers: US_NPI (National Provider Identifier for healthcare providers), HEALTHCARE_PLAN_BENEFICIARY, and country-specific health insurance numbers (e.g., DE_HEALTH_INSURANCE_NUMBER).
#insuranceidentifiersISO 27001
ISO/IEC 27001 Information Security Management
International standard for information security management systems (ISMS). Certification requires documented policies, risk assessments, and controls. anonym.legal's EU servers are ISO 27001-certified, ensuring structured security governance.
#iso27001ISO 27001 SoA
Statement of Applicability
A mandatory ISO 27001 document listing all Annex A controls, indicating which are applicable to the organization, and providing justification for inclusions and exclusions. Required for certification and audits.
#iso27001soaJWT
JSON Web Token
A compact, URL-safe token format used for transmitting claims between parties. anonym.legal uses JWTs signed with HS256 for internal service-to-service authentication (e.g., frontend → Presidio API). Tokens are short-lived and validated server-side.
#jwtKEK
Key Encryption Key
A key used to encrypt other keys rather than data directly. In anonym.legal's ZK architecture, the user's passphrase-derived key acts as a KEK to protect the per-document encryption keys stored in the encrypted vault.
#kekLanguage Detection
Automatic Language Detection
The automatic identification of the language of input text before PII analysis. anonym.legal detects language at the request level and routes to the appropriate NER model pipeline, with English as fallback for unsupported languages.
#languagedetectionMasking
Data Masking
Replacing sensitive values with realistic but fictitious data that preserves format and structure. Used for testing environments, analytics, and sharing datasets without exposing real PII.
#maskingMCP
Model Context Protocol
An open protocol by Anthropic enabling AI models to interact with external tools and data sources in a standardized way. anonym.legal implements an MCP Server so AI coding tools can invoke anonymization without leaving their workflow.
#mcpMCP Server
Model Context Protocol Server
anonym.legal's MCP Server integration enables AI coding assistants (Claude Desktop, Cursor, VS Code Copilot) to call the anonymization API directly as a tool. PII is stripped from code, prompts, and context before being sent to the AI model.
#mcpserverML Models
Machine Learning Models for PII Detection
Statistical models trained on labeled text corpora to recognize PII in context. anonym.legal uses both spaCy transformer pipelines and fine-tuned XLM-RoBERTa for multilingual entity recognition at production scale.
#mlmodelsNER
Named Entity Recognition
A natural language processing task that identifies and classifies named entities in text into predefined categories such as persons, organizations, locations, dates, and medical identifiers. The core ML technique powering PII detection in anonym.legal.
#nerNIS2
Network and Information Security Directive 2
EU Directive 2022/2555 expanding the original NIS Directive to cover more sectors (healthcare, energy, transport, digital infrastructure) and strengthening cybersecurity requirements. The transposition deadline was October 17, 2024; most EU member states missed it and the European Commission opened infringement proceedings against non-compliant states.
#nis2NLP
Natural Language Processing
A branch of artificial intelligence concerned with the interaction between computers and human language. In PII detection, NLP models understand context, grammar, and semantics to identify entities that regex patterns alone would miss.
#nlpOffice Add-in
anonym.legal Microsoft Office Add-in
Microsoft Office extension integrating PII anonymization directly into Word, Excel, and PowerPoint. Supports in-document redaction, preset management, ZK Auth, and sync across devices. Available from Microsoft AppSource.
#officeaddinOperators
Anonymization Operators
The replacement strategy applied to detected PII. anonym.legal supports REPLACE (placeholder text), REDACT (empty string), MASK (asterisks), HASH (SHA-256 digest), ENCRYPT (reversible AES-256-GCM), and CUSTOM (user-defined replacement).
#operatorsOver-Redaction
Over-Redaction (False Positives)
Removing more information than necessary, reducing document utility and potentially constituting spoliation in legal proceedings. Caused by low confidence thresholds or overly broad entity selection. Tunable via anonym.legal's threshold and entity controls.
#overredactionPCI DSS
Payment Card Industry Data Security Standard
Security standard for organizations handling payment card data, maintained by the PCI Security Standards Council. Requires encryption, access controls, logging, and regular testing. Non-compliance can result in fines and loss of card processing privileges.
#pcidssPHI
Protected Health Information
Any health-related information linked to an identifiable individual, regulated under HIPAA in the US. Includes diagnoses, treatment records, insurance data, and any of the 18 HIPAA Safe Harbor identifiers.
#phiPII
Personally Identifiable Information
Any data that can identify a specific individual directly or in combination with other data. Examples: names, email addresses, social security numbers, IP addresses, biometric records.
#piiPresets
Anonymization Presets
Saved configurations of selected entity types, confidence thresholds, and output options that can be applied with one click. Presets sync across Web App, Office Add-in, and Desktop App via encrypted cloud storage.
#presetsPresidio
Microsoft Presidio
An open-source data protection and anonymization SDK by Microsoft. anonym.legal's detection engine is built on Presidio's analyzer and anonymizer services, extended with 285+ custom entity recognizers across 48 languages.
#presidioPresidio Analyzer
Microsoft Presidio Analyzer Service
The detection component of anonym.legal's backend (port 8011). Accepts text and returns a list of detected PII entities with their positions, types, and confidence scores. Extended with 285+ custom recognizers across 48 languages.
#presidioanalyzerPresidio Anonymizer
Microsoft Presidio Anonymizer Service
The transformation component of anonym.legal's backend (port 8012). Takes text and analyzer results as input, applies the selected operator (REPLACE, REDACT, MASK, HASH, ENCRYPT) to each detected entity, and returns the anonymized text.
#presidioanonymizerPrivacy by Design
Privacy by Design and Default
The principle, mandated by GDPR Article 25, that data protection measures are built into systems from the outset rather than added as an afterthought. Encompasses data minimization, access controls, encryption, and pseudonymization at the architecture level.
#privacybydesignPseudonymization
Data Pseudonymization
Replacing direct identifiers with artificial values (pseudonyms) while retaining the ability to re-identify individuals using a separate key. GDPR Article 4(5) recognizes it as a privacy-enhancing technique but does not exempt pseudonymized data from the regulation.
#pseudonymizationRate Limiting
API Rate Limiting
Controls on the number of API requests a client can make within a time window. Prevents abuse and ensures fair resource allocation. anonym.legal applies per-user rate limits based on plan tier, with exponential backoff recommended for retry logic.
#ratelimitingRedaction
Data Redaction
Permanently removing or obscuring sensitive information from documents, replacing it with a visual marker such as [REDACTED] or a black bar. Unlike encryption, redaction is one-way and the original data cannot be recovered.
#redactionRegex
Regular Expression Pattern Matching
Pattern-based text matching using formal language syntax. In PII detection, regex handles structurally predictable identifiers (phone numbers, credit cards, IBANs, email addresses) with checksum validation. Complements NER for hybrid detection.
#regexREST API
RESTful API
anonym.legal exposes a RESTful HTTP API for programmatic integration. Endpoints include /api/analyze, /api/anonymize, /api/image, and /api/structured. Authenticated via JWT bearer tokens. Full OpenAPI documentation available in the API Reference.
#restapiRule 26
Federal Rules of Civil Procedure Rule 26
US civil procedure rule governing discovery obligations. Rule 26(g) requires attorneys to certify that discovery requests and disclosures are not made for improper purposes — including failing to properly redact PII from produced documents.
#rule26SCCs
Standard Contractual Clauses
Pre-approved GDPR-compliant contract clauses for transferring personal data from the EU/EEA to third countries. Updated by the European Commission in 2021 (2021/914/EU) to address Schrems II requirements, including a Transfer Impact Assessment.
#sccsSchrems II
Schrems II Ruling (C-311/18)
2020 Court of Justice of the EU ruling invalidating the EU-US Privacy Shield framework for transatlantic data transfers, citing insufficient US surveillance law protections. Requires supplementary measures (encryption, anonymization) when using Standard Contractual Clauses.
#schremsiiSHA-256
Secure Hash Algorithm 256-bit
A cryptographic hash function producing a 256-bit digest. Used in anonym.legal for HMAC authentication of API requests, ZK auth proofs, and consistent entity pseudonymization (hashing with salt produces the same replacement for the same original value).
#sha256spaCy
spaCy NLP Library
Industrial-strength open-source NLP library in Python. anonym.legal uses spaCy's transformer-based models for 24 languages (en, de, fr, es, it, pt, nl, pl, ru, zh, ja, ko, and more) for named entity recognition in the Presidio pipeline.
#spacySpoliation
Evidence Spoliation
The destruction, alteration, or failure to preserve evidence relevant to litigation. Over-aggressive redaction that renders documents unreadable can constitute spoliation. Calibrating anonymization precision (confidence thresholds, entity selection) is important for legally defensible redaction.
#spoliationStanza
Stanza NLP Library (Stanford NLP)
Stanford NLP Group's Python NLP toolkit supporting 70+ languages with state-of-the-art neural models. Used as a supplementary NER backend in anonym.legal for languages not covered by spaCy models.
#stanzaTLS
Transport Layer Security
The cryptographic protocol securing data in transit. anonym.legal enforces TLS 1.2 minimum with TLS 1.3 preferred, HSTS with one-year max-age, and HTTP/2. All traffic between clients and the server is encrypted in transit.
#tlsToken System
anonym.legal Credit Token System
Usage-based billing where API calls consume tokens calculated from text length, entity count, and processing mode (analyze vs. anonymize). Token costs are configurable in the DB and displayed in real-time before processing.
#tokensystemTokenization
Data Tokenization
Replacing sensitive data with a non-sensitive placeholder (token) that maps back to the original in a secure vault. Unlike encryption, the token itself has no mathematical relationship to the original data.
#tokenizationUnder-Redaction
Under-Redaction (False Negatives)
Failing to remove all PII, leaving individuals exposed in shared documents. The more common compliance risk. Caused by high confidence thresholds, missing entity types, or novel PII formats. Mitigated by anonym.legal's hybrid detection and custom entity support.
#underredactionUniversal Entities
Language-Universal Entity Types
Entity types detected regardless of text language, typically through format-based regex with checksum validation. Examples: CREDIT_CARD, IBAN_CODE, EMAIL_ADDRESS, PHONE_NUMBER, IP_ADDRESS, URL, CRYPTO address.
#universalentitiesVault
Encryption Key Vault
Secure local storage for encryption keys in the Desktop App, protected by Argon2id key derivation from a master passphrase. Keys are stored encrypted using AES-256-GCM and backed up via BIP39 mnemonic phrases.
#vaultVehicle Identifiers
Vehicle Entity Types
Entity types for vehicle-related identifiers: US_DRIVER_LICENSE, UK_DRIVER_LICENSE, EU_DRIVER_LICENSE, VIN (Vehicle Identification Number), and country-specific vehicle registration plate formats.
#vehicleidentifiersWeb App
anonym.legal Web Application
Browser-based interface at anonym.legal for PII analysis, anonymization, and decryption. Supports text input, file upload (PDF, DOCX, TXT), batch processing, ZK Auth, 48 languages, and 285+ entity types. No installation required.
#webappXChaCha20
XChaCha20-Poly1305
An authenticated encryption algorithm offering high performance on systems without AES hardware acceleration. Uses a 192-bit nonce (extended from the IETF ChaCha20 standard's 96-bit nonce per RFC 8439), eliminating nonce-collision risks. Used as an alternative cipher in anonym.legal's encryption layer.
#xchacha20XLM-RoBERTa
Cross-Lingual RoBERTa
A multilingual transformer language model trained on 100 languages, developed by Meta AI. Used in anonym.legal for cross-lingual NER tasks, particularly for entity types and languages where monolingual models are unavailable.
#xlmrobertaZero-Knowledge
Zero-Knowledge Architecture
A system design where the service provider has zero access to users' plaintext data or encryption keys. All encryption/decryption happens client-side; the server never sees the original content. Prevents insider threats and compelled disclosure.
#zeroknowledgeZK Auth
Zero-Knowledge Authentication
anonym.legal's authentication system where encryption keys are derived client-side from the user's passphrase using Argon2id, never transmitted to or stored on the server. The server stores only a cryptographic proof, not the key or passphrase.
#zkauth