Our Hybrid Detection Approach

Regex patterns for structured data. Proven ML models for names. The best of both worlds for accurate, auditable PII detection.

How We Detect Different Entity Types

We use the best tool for each job: deterministic regex patterns for structured data, and proven ML models for names and entities. Built on Microsoft Presidio.

Entity TypeDetection MethodExamples
Structured Data
Regex Patterns
Emails, SSNs, credit cards, IBANs, phone numbers
Names & Organizations
ML Models (spaCy, Stanza)
Person names, company names, locations
48 Languages
XLM-RoBERTa
Cross-lingual entity recognition
Structured Data Results
100% Reproducible
Same input = same output, every time
Name Detection
High Accuracy ML
Proven NLP models with confidence scores
All Detections
+Fully Auditable
Position, type, confidence for every entity

How Pattern Matching Works

Structured data uses carefully crafted regex patterns that match specific formats with 100% reproducibility.

Email Addresses

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

Matches standard email format: local-part@domain.tld

Credit Card Numbers

\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|...)\b

Matches Visa, Mastercard, Amex, and other card formats with Luhn validation

German IBAN

DE[0-9]{2}\s?[0-9]{4}\s?[0-9]{4}\s?[0-9]{4}\s?[0-9]{4}\s?[0-9]{2}

Matches German IBAN format with optional spaces

Built for Compliance

When auditors ask "why was this detected?" you get a clear answer with entity type, position, and confidence score.

  • GDPR Article 25: Privacy by design with explainable processing
  • ISO 27001: Documented, repeatable processes on certified infrastructure
  • Audit Trail: Every detection includes type, position, and confidence

Example Audit Response

Q: Why was "john.smith@company.com" flagged?
A: Matched EMAIL_ADDRESS pattern at position 45-68 with confidence 0.95. Detection method: regex pattern matching.

Experience Hybrid Detection

Try our PII detection free with 200 tokens per cycle.