Bumalik sa BlogSeguridad ng AI

Ang Confluence Wiki + Customer PII Screenshots...

Ang engineering teams ay nag-document ng workflows sa Confluence. Ang documentation ay may screenshots ng customer data.

April 21, 20266 min basahin
Confluence GDPRinternal wiki PIIcustomer datadocumentation privacydata minimization

Ang Confluence PII Problem

Confluence (wiki/collaboration tool) ay central documentation repository. Teams ay nag-create pages:

Page: "How to Debug Customer Account Issues"
Content:
  1. Log in sa admin dashboard
  2. Search customer by email
  3. Review account balance + recent transactions
  4. Screenshot: [shows customer name, email, account balance, credit card last-4]

Page ay accessible sa:

  • Engineering team (intended)
  • Product team (ay nabasa accidentally)
  • Sales (may access sa documentation space)
  • Contractors (temporary Confluence access)
  • Sometimes: Publicly indexed (search engines)

GDPR exposure: Customer data ay stored sa wiki, accessed by 50+ people, searchable, archived indefinitely.

Why Confluence is a PII Risk Vector

  1. Broad access — Wiki spaces ay usually have wide access ("Read-only para sa company")
  2. Searchability — Full-text search finds customer data sa screenshots
  3. Version history — Old versions ay still accessible (even if newer version deleted content)
  4. Audit trail gaps — Access logs ay not detailed (who read what page?)
  5. Copy-paste ease — Anyone ay maaaring mag-copy content, nag-share sa email / Slack
  6. Archival — Pages ay nag-remain searchable years after relevant

Common Confluence PII Scenarios

Scenario 1: How-To Documentation

Page: "Refund Processing Workflow"

Steps:
1. Access customer account
2. Verify payment method
3. Process refund

Example:
[Screenshot]
  Customer: John Doe
  Email: john@example.com
  Phone: 555-1234
  Order: #12345
  Refund: $99.99

Access: ~100 employees (customer service, finance, operations)

Scenario 2: Troubleshooting Documentation

Page: "Error Code 5042: Account Balance Error - Troubleshooting Guide"

Root cause:
"This error occurs when customer payment fails..."

[Screenshot]
  Customer account: johndoe_2005
  Balance: $-52.34
  Payment method: Visa ****1234
  Status: Failed

Scenario 3: Training Documentation (New Employee Onboarding)

Page: "Customer Service Training - Module 1"

[Multiple screenshots showing real customer accounts used sa training]
  - Customer 1: Jane Smith, email, phone, account status
  - Customer 2: Bob Wilson, address, order history
  - Customer 3: Maria Garcia, subscription details

Strategy 1: Template-Based Screenshots (No Real Data)

Create mock customer accounts para sa documentation:

Before:
[Real customer data sa screenshot]

After:
[Mock customer: "Sample Customer One", fictional data]

Implementation:

  1. Create test account sa production environment:

    • Email: docs+demo@company.com
    • Name: "Demo User"
    • Phone: "555-DEMO-DEMO"
    • Address: "123 Demo St, Demo City, DC 12345"
  2. Use test account para sa all documentation screenshots

  3. Policy: "Never screenshot real customer data; use demo account"

Benefits:

  • Screenshots contain only fictional data
  • Real customers never appear sa documentation
  • Easy to implement

Challenges:

  • Test account ay kailangan maintain
  • May need multiple test accounts (different customer types)
  • Existing documentation may need remediation

Strategy 2: Data Masking in Screenshots

Before posting screenshot sa Confluence, mask sensitive fields:

Before:
Customer: John Doe
Email: john@example.com
Phone: 555-1234

After:
Customer: [NAME]
Email: [EMAIL]
Phone: [PHONE]

Tools:

  • Screenshot tools na may built-in annotation (Snagit, Lightshot)
  • Browser extensions na nag-mask PII bago screenshot
  • Post-processing scripts (ImageMagick, Python PIL)

Example workflow:

# Take screenshot
screenshot.png

# Post-process to mask PII
python mask_pii_in_image.py screenshot.png → screenshot_masked.png

# Upload masked version sa Confluence

Benefits:

  • Screenshots are safe (PII masked)
  • Still visually helpful (layout visible)

Challenges:

  • Manual effort (need to mask bago upload)
  • Easy to forget (habit of posting unmasked)
  • Masking ay imperfect (may miss some PII)

Strategy 3: Access Control + Retention Policy

Restrict access sa pages containing PII:

confluence_pii_access_control:
  pages_with_real_data:
    - Access: Engineering team only (10 people)
    - Sharing: Requires manager approval
    - Retention: 30 days, then auto-delete
    
  pages_with_masked_data:
    - Access: Engineering + Product + Support (~80 people)
    - Retention: No auto-delete (documents remain useful)
    
  pages_with_fictional_data:
    - Access: Open (anyone sa company)
    - Retention: Indefinite (safest option)

Implementation:

Confluence permissions:

Page: "Refund Processing with Real Data"
Restrictions:
  - Only group: "billing-engineers"
  - Edit: Only 2 people (manager + specialist)
  - View history: Disabled (prevent viewing old versions)
  - Comments: Disabled (prevent discussion)
  - Export: Disabled (prevent copy-out)

Retention automation:

# Confluence API script
for page in get_pages_with_tag('contains-real-pii'):
    created_date = page.created
    age_days = (today - created_date).days
    
    if age_days > 30:
        page.archive()  # Or delete
        log_audit_event(f"Archived {page.title} after 30 days")

Benefits:

  • Access restricted (only need-to-know)
  • Automatic retention (no manual cleanup)
  • Audit trail (who accessed, when)

Challenges:

  • Operational overhead (manage access control)
  • Role creep (people accumulate access)
  • Complexity (maintaining permissions across teams)

Strategy 4: Separate Spaces (Public vs. Sensitive)

Create two wiki spaces:

Confluence space 1: "Engineering Documentation (Public)"
  - Content: How-tos, architecture, best practices
  - NO customer data, NO real examples
  - Access: Anyone sa company (including contractors, temps)
  - Retention: Indefinite
  
Confluence space 2: "Sensitive Documentation (Restricted)"
  - Content: Real customer examples, troubleshooting na may data
  - Customer data ay masked o fictional
  - Access: Engineering + support teams only (50 people)
  - Retention: 30 days auto-delete

Benefits:

  • Clear separation (public = safe)
  • Default deny (sensitive space ay restricted by default)
  • Easy governance (two buckets)

Challenges:

  • Requires discipline (teams may put real data sa public space anyway)
  • Documentation duplication (may need versions para sa both spaces)

GDPR Compliance: Confluence Policy

confluence_pii_policy:
  creation:
    - Before writing documentation: Assess if real data ay needed
    - If no: Use fictional / masked data
    - If yes: Document business justification
    
  real_data_handling:
    - Pages containing real customer data ay tagged [CONTAINS_PII]
    - Tags ay automatically restricts access sa team only
    - Manager approval required bago posting
    - Screenshot template ay provided (pre-masked version)
    - Retention: 30 days max
    
  fictional_data_handling:
    - Demo account ay provided (docs+demo@company.com)
    - Use demo account sa all new documentation
    - Old pages ay migrated sa demo data over time
    - No access control needed (fictional = safe)
    
  access_control:
    - [CONTAINS_PII] pages: Engineering team only
    - Regular pages: Open access
    - Audit: Confluence admin logs all access
    
  retention:
    - Pages tagged [CONTAINS_PII]: Auto-delete after 30 days
    - Regular pages: Indefinite
    - Version history: Keep, but restrict access per parent page
    
  audit:
    - Monthly compliance report: How many [CONTAINS_PII] pages exist?
    - If >5: Investigation required
    - Quarterly: Sample pages, verify no unmasked PII

Testing: Confluence PII Audit

Before compliance review:

def audit_confluence_for_pii():
    all_pages = confluence.get_all_pages()
    
    for page in all_pages:
        # Check for email patterns
        emails = re.findall(r'\S+@\S+\.\S+', page.content)
        if emails:
            print(f"WARNING: {page.title} may contain emails: {emails[:3]}")
        
        # Check for phone patterns
        phones = re.findall(r'\d{3}[-.\s]?\d{3}[-.\s]?\d{4}', page.content)
        if phones:
            print(f"WARNING: {page.title} may contain phones: {phones[:3]}")
        
        # Check for SSN patterns
        ssns = re.findall(r'\d{3}-\d{2}-\d{4}', page.content)
        if ssns:
            print(f"CRITICAL: {page.title} contains SSN patterns!")
            
        # Check for access control
        if 'pii' in page.title.lower() or emails or phones:
            if not is_restricted(page):
                print(f"CRITICAL: {page.title} contains PII but ay publicly accessible!")

Remediation Template

For pages na may unmasked PII:

1. Screenshot original page (archive)
2. Delete sensitive content o mask it
3. Re-post masked version
4. Audit trail: Document what was removed + why
5. Communication: Notify team na page was updated
6. Retention: Schedule deletion sa 30 days

Conclusion

Confluence + PII ay requires proactive management. Best approach ay:

  1. Prefer fictional data (demo accounts para sa documentation)
  2. Mask real data (bago posting, use masking tools)
  3. Restrict access (engineering team only para sa sensitive pages)
  4. Auto-delete (30-day retention para sa real data pages)
  5. Audit regularly (quarterly PII scan ng wiki)
  6. Training (remind teams: use demo data, not real customers)

This ay balances documentation value (engineering teams need examples) with privacy (GDPR compliance).

Handa nang protektahan ang iyong data?

Simulan ang anonymization ng PII gamit ang 285+ uri ng entidad sa 48 wika.