Bumalik sa BlogTeknikal

Ang API Log JSON PII Masking: Observability + GDPR sa...

Ang API logs ay nag-record ng request/response data real-time. Ang unmasked logs ay nag-expose ng PII sa monitoring dashboards.

April 21, 20266 min basahin
API logsGDPR complianceJSON anonymizationobservabilitystorage limitation

Ang API Log PII Exposure

API logs ay essential para sa troubleshooting:

POST /api/users/create HTTP/1.1
{"name": "John Doe", "email": "john@example.com", "ssn": "123-45-6789", "phone": "555-123-4567"}

Response 201:
{"user_id": "12345", "email": "john@example.com", "created_at": "2025-03-08T14:23:15Z"}

Sa monitoring system (Splunk, Datadog, New Relic), ang logs ay visible sa:

  • Engineering team dashboards
  • Support troubleshooting systems
  • Incident response team
  • Sometimes: Customers (self-service support portals)

GDPR risk: Ang PII ay visible sa multiple systems, potentially sa non-EU regions.

The Masking Challenge

Problem 1: Information Loss

If you mask email:

{"email": "[MASKED]"}

Debugging ay harder (can't tell kung different users o same user).

Problem 2: Selective Masking Complexity

Different PII fields may need different handling:

  • Email: Mask in logs para sa support staff visibility, pero unmasked para sa engineering
  • SSN: Always mask, never unmasked
  • User ID: Never mask (needed para sa troubleshooting)
  • Request body: Sometimes PII, sometimes no

Problem 3: Multi-Stage Processing

API logs flow through multiple systems:

Application → Logging framework → Log collector → Storage → Monitoring UI

Each stage may need different masking rules.

Strategy 1: Structured Field Extraction + Selective Masking

Before logging, extract PII fields explicitly:

const sensitiveFields = ['ssn', 'credit_card', 'bank_account', 'phone', 'medical_data'];

function maskRequest(req) {
  const masked = { ...req };
  sensitiveFields.forEach(field => {
    if (masked[field]) {
      masked[field] = '[REDACTED]';
    }
  });
  return masked;
}

logger.info('API Request', maskRequest(req.body));

Benefits:

  • Application-level control
  • No loss of non-sensitive data
  • Consistent across all endpoints

Challenges:

  • Requires application code changes
  • List ng sensitive fields ay kailangan maintain
  • Nested objects ay harder to mask

Strategy 2: Observability Platform Native Masking

Datadog + Splunk + New Relic ay may built-in PII masking:

Datadog:

datadog_config:
  logs:
    processing_rules:
      - type: mask_sequences
        regex: '\d{3}-\d{2}-\d{4}'
        name: ssn
      - type: mask_sequences
        regex: '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+'
        name: email

Splunk:

props.conf:
[api_logs]
SHOULD_LINEMERGE = false
KEY_INDICATOR_FIELDS = _raw
KEY_INDICATORS = ssn, credit_card, email

transforms.conf:
[mask_ssn]
REGEX = (\d{3}-?\d{2}-?\d{4})
DEST_KEY = _raw
FORMAT = [REDACTED]

Benefits:

  • No application code changes
  • Centralized policy management
  • Update rules without redeployment

Challenges:

  • Regex ay error-prone (edge cases)
  • Performance overhead (regex matching on high-volume logs)
  • False positives (numbers na hindi PII)

Strategy 3: Role-Based Log Access + Encryption

Store unmasked logs but restrict access:

Tier 1: Engineering team — full logs (unmasked) Tier 2: DevOps/SRE — masked logs (sensitive PII redacted) Tier 3: Support team — masked logs + anonymized user IDs Tier 4: Customers — no direct access (aggregate metrics only)

Implementation:

log_access_policy:
  engineering:
    - can_query: all_fields
    - can_export: restricted (no bulk export)
    - audit_trail: required
    
  devops:
    - can_query: masked_fields_only
    - can_export: yes
    - audit_trail: required
    
  support:
    - can_query: masked_fields + ticket_id
    - can_export: no
    - audit_trail: required
    
  customers:
    - can_query: no
    - can_view: error_message_only
    - audit_trail: required

Benefits:

  • Full data available para sa debugging
  • Access is controlled at role level
  • Audit trail tracks who accessed what

Challenges:

  • Requires sophisticated access control
  • Operational burden (managing roles)

Strategy 4: Tokenization + Separate Key Management

Replace PII sa logs na may tokens:

{"name": "John Doe", "email": "tok_pii_abc123", "user_id": "12345"}

Token lookup (separate, encrypted storage):

tok_pii_abc123 -> john@example.com (encrypted key, access-logged)

Benefits:

  • Logs ay completely anonymized
  • Still can correlate users via token
  • Token keys ay easier to manage/rotate

Challenges:

  • Token generation ay performance overhead
  • Token lookup ay requires separate secure storage
  • Debugging ay requires token resolution (extra step)

GDPR Compliance Template: API Logging

api_logging_policy:
  collection:
    - Log only necessary fields (request method, path, status code, duration, user_id)
    - Do NOT log: full request body, full response body, raw PII
    - Exceptions documented + justified
    
  masking:
    - High-risk fields (SSN, credit card, medical): Always redact
    - Medium-risk (email, phone): Redact para sa non-engineering access
    - Low-risk (user_id, ip_address, timestamp): Keep
    
  storage:
    - Encryption at rest: Required
    - Encryption in transit: TLS
    - Access control: Role-based
    
  retention:
    - Operational logs: 30 days (auto-delete)
    - Security audit logs: 1 year (compliance requirement)
    - Anonymized logs: Extended (analytics)
    
  access:
    - Engineering: Full logs (audit trail required)
    - Operations: Masked logs
    - Support: Limited access via ticket
    - Customers: Never direct access
    
  auditing:
    - Log all access sa logs
    - Alert on bulk exports
    - Monthly compliance review

Testing: Validate Masking

Before deploying to production:

test_logs = [
    {"endpoint": "/api/users", "ssn": "123-45-6789", "status": 200},
    {"endpoint": "/api/email", "email": "john@example.com", "status": 201},
    {"endpoint": "/api/profile", "user_id": "12345", "status": 200},
]

for log in test_logs:
    masked = mask_log(log)
    
    # Verify SSN is masked
    assert 'ssn' not in masked or masked['ssn'] == '[REDACTED]'
    
    # Verify email is masked
    assert 'email' not in masked or masked['email'] == '[REDACTED]'
    
    # Verify user_id is NOT masked
    assert masked['user_id'] == 12345

Production Monitoring

Set alerts para sa unmasked PII sa logs:

Alert: "SSN detected in API logs"
Alert: "Unencrypted credit card sa log output"
Alert: "PII query sa public dashboard"

Conclusion

API log masking ay balance sa observability + privacy. Best approach ay:

  1. Extract sensitive fields upfront (application level)
  2. Use platform masking para sa edge cases
  3. Enforce role-based access (Engineering > DevOps > Support)
  4. Maintain unmasked encrypted archive (para sa incident investigation)
  5. Auto-delete after retention window

This ay allows engineers to debug effectively habang maintaining GDPR compliance.

Handa nang protektahan ang iyong data?

Simulan ang anonymization ng PII gamit ang 285+ uri ng entidad sa 48 wika.