Our Hybrid Detection Approach
Regex patterns for structured data. Proven ML models for names. The best of both worlds for accurate, auditable PII detection.
How We Detect Different Entity Types
We use the best tool for each job: deterministic regex patterns for structured data, and proven ML models for names and entities. Built on Microsoft Presidio.
| Entity Type | Detection Method | Examples |
|---|---|---|
| Structured Data | Regex Patterns | Emails, SSNs, credit cards, IBANs, phone numbers |
| Names & Organizations | ML Models (spaCy, Stanza) | Person names, company names, locations |
| 48 Languages | XLM-RoBERTa | Cross-lingual entity recognition |
| Structured Data Results | 100% Reproducible | Same input = same output, every time |
| Name Detection | High Accuracy ML | Proven NLP models with confidence scores |
| All Detections | +Fully Auditable | Position, type, confidence for every entity |
How Pattern Matching Works
Structured data uses carefully crafted regex patterns that match specific formats with 100% reproducibility.
Email Addresses
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}Matches standard email format: local-part@domain.tld
Credit Card Numbers
\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|...)\bMatches Visa, Mastercard, Amex, and other card formats with Luhn validation
German IBAN
DE[0-9]{2}\s?[0-9]{4}\s?[0-9]{4}\s?[0-9]{4}\s?[0-9]{4}\s?[0-9]{2}Matches German IBAN format with optional spaces
Built for Compliance
When auditors ask "why was this detected?" you get a clear answer with entity type, position, and confidence score.
- GDPR Article 25: Privacy by design with explainable processing
- ISO 27001: Documented, repeatable processes on certified infrastructure
- Audit Trail: Every detection includes type, position, and confidence
Example Audit Response
Q: Why was "john.smith@company.com" flagged?
A: Matched EMAIL_ADDRESS pattern at position 45-68 with confidence 0.95. Detection method: regex pattern matching.
Experience Hybrid Detection
Try our PII detection free with 200 tokens per cycle.