Ang API Log PII Exposure
API logs ay essential para sa troubleshooting:
POST /api/users/create HTTP/1.1
{"name": "John Doe", "email": "john@example.com", "ssn": "123-45-6789", "phone": "555-123-4567"}
Response 201:
{"user_id": "12345", "email": "john@example.com", "created_at": "2025-03-08T14:23:15Z"}
Sa monitoring system (Splunk, Datadog, New Relic), ang logs ay visible sa:
- Engineering team dashboards
- Support troubleshooting systems
- Incident response team
- Sometimes: Customers (self-service support portals)
GDPR risk: Ang PII ay visible sa multiple systems, potentially sa non-EU regions.
The Masking Challenge
Problem 1: Information Loss
If you mask email:
{"email": "[MASKED]"}
Debugging ay harder (can't tell kung different users o same user).
Problem 2: Selective Masking Complexity
Different PII fields may need different handling:
- Email: Mask in logs para sa support staff visibility, pero unmasked para sa engineering
- SSN: Always mask, never unmasked
- User ID: Never mask (needed para sa troubleshooting)
- Request body: Sometimes PII, sometimes no
Problem 3: Multi-Stage Processing
API logs flow through multiple systems:
Application → Logging framework → Log collector → Storage → Monitoring UI
Each stage may need different masking rules.
Strategy 1: Structured Field Extraction + Selective Masking
Before logging, extract PII fields explicitly:
const sensitiveFields = ['ssn', 'credit_card', 'bank_account', 'phone', 'medical_data'];
function maskRequest(req) {
const masked = { ...req };
sensitiveFields.forEach(field => {
if (masked[field]) {
masked[field] = '[REDACTED]';
}
});
return masked;
}
logger.info('API Request', maskRequest(req.body));
Benefits:
- Application-level control
- No loss of non-sensitive data
- Consistent across all endpoints
Challenges:
- Requires application code changes
- List ng sensitive fields ay kailangan maintain
- Nested objects ay harder to mask
Strategy 2: Observability Platform Native Masking
Datadog + Splunk + New Relic ay may built-in PII masking:
Datadog:
datadog_config:
logs:
processing_rules:
- type: mask_sequences
regex: '\d{3}-\d{2}-\d{4}'
name: ssn
- type: mask_sequences
regex: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+'
name: email
Splunk:
props.conf:
[api_logs]
SHOULD_LINEMERGE = false
KEY_INDICATOR_FIELDS = _raw
KEY_INDICATORS = ssn, credit_card, email
transforms.conf:
[mask_ssn]
REGEX = (\d{3}-?\d{2}-?\d{4})
DEST_KEY = _raw
FORMAT = [REDACTED]
Benefits:
- No application code changes
- Centralized policy management
- Update rules without redeployment
Challenges:
- Regex ay error-prone (edge cases)
- Performance overhead (regex matching on high-volume logs)
- False positives (numbers na hindi PII)
Strategy 3: Role-Based Log Access + Encryption
Store unmasked logs but restrict access:
Tier 1: Engineering team — full logs (unmasked) Tier 2: DevOps/SRE — masked logs (sensitive PII redacted) Tier 3: Support team — masked logs + anonymized user IDs Tier 4: Customers — no direct access (aggregate metrics only)
Implementation:
log_access_policy:
engineering:
- can_query: all_fields
- can_export: restricted (no bulk export)
- audit_trail: required
devops:
- can_query: masked_fields_only
- can_export: yes
- audit_trail: required
support:
- can_query: masked_fields + ticket_id
- can_export: no
- audit_trail: required
customers:
- can_query: no
- can_view: error_message_only
- audit_trail: required
Benefits:
- Full data available para sa debugging
- Access is controlled at role level
- Audit trail tracks who accessed what
Challenges:
- Requires sophisticated access control
- Operational burden (managing roles)
Strategy 4: Tokenization + Separate Key Management
Replace PII sa logs na may tokens:
{"name": "John Doe", "email": "tok_pii_abc123", "user_id": "12345"}
Token lookup (separate, encrypted storage):
tok_pii_abc123 -> john@example.com (encrypted key, access-logged)
Benefits:
- Logs ay completely anonymized
- Still can correlate users via token
- Token keys ay easier to manage/rotate
Challenges:
- Token generation ay performance overhead
- Token lookup ay requires separate secure storage
- Debugging ay requires token resolution (extra step)
GDPR Compliance Template: API Logging
api_logging_policy:
collection:
- Log only necessary fields (request method, path, status code, duration, user_id)
- Do NOT log: full request body, full response body, raw PII
- Exceptions documented + justified
masking:
- High-risk fields (SSN, credit card, medical): Always redact
- Medium-risk (email, phone): Redact para sa non-engineering access
- Low-risk (user_id, ip_address, timestamp): Keep
storage:
- Encryption at rest: Required
- Encryption in transit: TLS
- Access control: Role-based
retention:
- Operational logs: 30 days (auto-delete)
- Security audit logs: 1 year (compliance requirement)
- Anonymized logs: Extended (analytics)
access:
- Engineering: Full logs (audit trail required)
- Operations: Masked logs
- Support: Limited access via ticket
- Customers: Never direct access
auditing:
- Log all access sa logs
- Alert on bulk exports
- Monthly compliance review
Testing: Validate Masking
Before deploying to production:
test_logs = [
{"endpoint": "/api/users", "ssn": "123-45-6789", "status": 200},
{"endpoint": "/api/email", "email": "john@example.com", "status": 201},
{"endpoint": "/api/profile", "user_id": "12345", "status": 200},
]
for log in test_logs:
masked = mask_log(log)
# Verify SSN is masked
assert 'ssn' not in masked or masked['ssn'] == '[REDACTED]'
# Verify email is masked
assert 'email' not in masked or masked['email'] == '[REDACTED]'
# Verify user_id is NOT masked
assert masked['user_id'] == 12345
Production Monitoring
Set alerts para sa unmasked PII sa logs:
Alert: "SSN detected in API logs"
Alert: "Unencrypted credit card sa log output"
Alert: "PII query sa public dashboard"
Conclusion
API log masking ay balance sa observability + privacy. Best approach ay:
- Extract sensitive fields upfront (application level)
- Use platform masking para sa edge cases
- Enforce role-based access (Engineering > DevOps > Support)
- Maintain unmasked encrypted archive (para sa incident investigation)
- Auto-delete after retention window
This ay allows engineers to debug effectively habang maintaining GDPR compliance.