Back to BlogGDPR & Compliance

Beyond SSNs and Email Addresses: Anonymizing Your Organization's Custom Identifiers

Every organization has internal identifiers — employee IDs, account numbers, order IDs — that are personally identifiable in context but missed by standard PII tools. Custom entity creation closes this re-identification gap without engineering resources.

March 5, 20267 min read
custom PII detectionorganizational identifiersre-identification riskGDPR pseudonymizationcustom entity

Beyond SSNs and Email Addresses: Anonymizing Your Organization's Custom Identifiers

Your GDPR anonymization tool detects email addresses. It detects phone numbers. It detects names and social security numbers. You run your support ticket exports through it, download the anonymized output, and share it with your analytics team.

Your customer account numbers (ACC-XXXXXXXX-XX format) are still in every ticket. Your order IDs (ORD-XXXXXXX) are still present. Your internal user IDs are still there.

These identifiers are pseudonymous in isolation — they don't directly identify a person without access to a lookup table. But your analytics team has access to that lookup table. Your support database has it. Your CRM has it. The anonymized export can be re-identified in seconds by anyone with access to any of these systems.

This is a GDPR pseudonymization failure — not because the tool missed standard PII, but because it couldn't know about identifiers specific to your organization.

What Standard PII Tools Detect

Standard PII detection tools — including base Microsoft Presidio configurations — are built around universal identifier formats:

What's covered:

  • Social security numbers (US SSNs, UK NINOs, EU national ID formats)
  • Email addresses (RFC 5322 format)
  • Phone numbers (E.164 and national formats)
  • Credit card numbers (Luhn algorithm validation)
  • Names (NER model-based detection)
  • Passport/driver's license numbers (country-specific formats)

What's not covered:

  • Your employee ID format (EMP-XXXXX)
  • Your customer account number format (ACC-XXXXXXXX-XX)
  • Your order ID format (ORD-XXXXXXX)
  • Your internal user ID (UUID or custom format)
  • Your internal reference codes
  • Partner-specific identifiers

Standard tools detect what's universal. Organization-specific identifiers are, by definition, not universal. They require custom configuration.

The Re-Identification Risk in Practice

A financial services firm processes customer support tickets for quality analysis. Their standard PII anonymization workflow removes:

  • Customer names ✓
  • Email addresses ✓
  • Phone numbers ✓
  • Account numbers (ACC-XXXXXXXX-XX format) ✗ — not detected

The ticket export goes to the analytics team. A data analyst joins the ticket table with the customer database on account number. Re-identification is immediate and complete.

This doesn't require sophisticated attack techniques. It's a routine SQL join that any analyst would perform to add customer demographic context to support ticket analysis. The "anonymized" export was not anonymous.

GDPR Article 4(5) defines pseudonymization as "processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information." Account numbers fail this test when the additional information (the customer database) is readily available.

Building Custom Entity Patterns

Custom entity creation follows a straightforward workflow for non-technical compliance teams:

Step 1: Identify the identifier format Document your organization-specific identifiers and their formats:

  • Customer account: ACC-XXXXXXXX-XX (ACC prefix, 8-digit number, 2-character suffix)
  • Order ID: ORD-XXXXXXX (ORD prefix, 7-digit number)
  • Employee ID: EMP-XXXXX (EMP prefix, 5-digit number)
  • Internal user ID: UUID format (8-4-4-4-12 hexadecimal)

Step 2: Generate the detection pattern Describe the format in plain language: "Account numbers that start with ACC, then a dash, then 8 digits, then a dash, then 2 uppercase letters."

AI-assisted pattern generation produces: ACC-d{8}-[A-Z]{2}

Step 3: Validate against sample data Upload 20-30 documents containing the identifier. Verify:

  • All instances are detected (no false negatives)
  • No false positives (non-identifier text incorrectly flagged)

Step 4: Configure anonymization method For identifiers used as join keys (order IDs that appear in multiple systems and need to be consistent for analysis):

  • Pseudonymize: Replace ACC-00123456-AB consistently with ACC-99876543-XY across all documents. The replacement is consistent — the same input always produces the same output — so analytical joins still work. The original value is not recoverable without the key.

For identifiers not needed for analysis:

  • Redact: Replace with [REDACTED]. Simpler, irreversible.

Step 5: Save as preset The custom entity (or multiple custom entities) saved as a team preset applies consistently across all processing — batch uploads, API calls, browser interface. New team members automatically get the complete configuration.

Case Study: 180,000 Support Tickets

A financial services firm has customer account numbers (ACC-XXXXXXXX-XX format) appearing throughout historical support ticket exports. Standard PII tools missed them entirely.

Gap identified: After a compliance review, the team realized 180,000 historical support tickets in their analytics warehouse contained unredacted account numbers alongside (already-anonymized) names and emails.

Resolution timeline:

  1. Compliance officer defines ACC pattern (15 minutes)
  2. Test against 30 sample tickets (20 minutes)
  3. Confirm pattern accuracy (10 minutes)
  4. Process 180,000 tickets in overnight batch
  5. Replace warehouse tables with re-anonymized versions

Total time to close the compliance gap: 45 minutes of compliance officer time + overnight batch. Without custom entity creation, this would require an engineering ticket, development time, code review, and deployment — weeks, not hours.

Beyond Support Tickets: Where Custom Identifiers Appear

Custom organizational identifiers propagate across more document types than most compliance teams realize:

Internal documents:

  • Meeting notes referencing account numbers or order IDs
  • Email threads with customer references
  • Presentations with case study data

Shared with third parties:

  • Reports to regulators with case reference numbers
  • Data shared with auditors
  • Vendor documents with customer references

Research and analytics:

  • Customer journey analysis datasets
  • Support quality review datasets
  • Training data for internal ML models

Each of these requires the same custom entity configuration to produce genuinely anonymous output.

GDPR Pseudonymization vs. Anonymization: The Technical Distinction

GDPR distinguishes between:

Pseudonymization: Data that can be re-identified with access to additional information. Pseudonymized data is still personal data under GDPR. The regulation encourages pseudonymization as a risk reduction measure, but it doesn't remove GDPR obligations.

Anonymization: Data that cannot reasonably be re-identified. Anonymous data is not personal data and is not subject to GDPR.

Account numbers, order IDs, and employee IDs are pseudonymous — not anonymous — when lookup tables exist. Replacing them with consistent pseudonyms (pseudonymization) reduces risk but doesn't eliminate GDPR obligations. Replacing them with random tokens (anonymization by destruction of the key) eliminates GDPR obligations but breaks joins.

For sharing with third parties who don't have access to your lookup tables: pseudonymization may be sufficient (they can't re-identify without the key). For internal analytics: full anonymization or access controls on the key are necessary.

Conclusion

The standard PII detection gap is not a technical limitation of the detection algorithms — it's a configuration gap. No detection tool can know your organization's account number format unless you tell it.

Custom entity creation closes this gap in hours rather than weeks. Compliance teams — without engineering support — can define organization-specific patterns, validate them against sample data, and apply them consistently across all processing modes.

The 180,000 unredacted account numbers discovered in the case study were not there because of tool failure. They were there because the tool was never told to look for them.

Sources:

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.