Ang Confluence PII Problem
Confluence (wiki/collaboration tool) ay central documentation repository. Teams ay nag-create pages:
Page: "How to Debug Customer Account Issues"
Content:
1. Log in sa admin dashboard
2. Search customer by email
3. Review account balance + recent transactions
4. Screenshot: [shows customer name, email, account balance, credit card last-4]
Page ay accessible sa:
- Engineering team (intended)
- Product team (ay nabasa accidentally)
- Sales (may access sa documentation space)
- Contractors (temporary Confluence access)
- Sometimes: Publicly indexed (search engines)
GDPR exposure: Customer data ay stored sa wiki, accessed by 50+ people, searchable, archived indefinitely.
Why Confluence is a PII Risk Vector
- Broad access — Wiki spaces ay usually have wide access ("Read-only para sa company")
- Searchability — Full-text search finds customer data sa screenshots
- Version history — Old versions ay still accessible (even if newer version deleted content)
- Audit trail gaps — Access logs ay not detailed (who read what page?)
- Copy-paste ease — Anyone ay maaaring mag-copy content, nag-share sa email / Slack
- Archival — Pages ay nag-remain searchable years after relevant
Common Confluence PII Scenarios
Scenario 1: How-To Documentation
Page: "Refund Processing Workflow"
Steps:
1. Access customer account
2. Verify payment method
3. Process refund
Example:
[Screenshot]
Customer: John Doe
Email: john@example.com
Phone: 555-1234
Order: #12345
Refund: $99.99
Access: ~100 employees (customer service, finance, operations)
Scenario 2: Troubleshooting Documentation
Page: "Error Code 5042: Account Balance Error - Troubleshooting Guide"
Root cause:
"This error occurs when customer payment fails..."
[Screenshot]
Customer account: johndoe_2005
Balance: $-52.34
Payment method: Visa ****1234
Status: Failed
Scenario 3: Training Documentation (New Employee Onboarding)
Page: "Customer Service Training - Module 1"
[Multiple screenshots showing real customer accounts used sa training]
- Customer 1: Jane Smith, email, phone, account status
- Customer 2: Bob Wilson, address, order history
- Customer 3: Maria Garcia, subscription details
Strategy 1: Template-Based Screenshots (No Real Data)
Create mock customer accounts para sa documentation:
Before:
[Real customer data sa screenshot]
After:
[Mock customer: "Sample Customer One", fictional data]
Implementation:
-
Create test account sa production environment:
- Email: docs+demo@company.com
- Name: "Demo User"
- Phone: "555-DEMO-DEMO"
- Address: "123 Demo St, Demo City, DC 12345"
-
Use test account para sa all documentation screenshots
-
Policy: "Never screenshot real customer data; use demo account"
Benefits:
- Screenshots contain only fictional data
- Real customers never appear sa documentation
- Easy to implement
Challenges:
- Test account ay kailangan maintain
- May need multiple test accounts (different customer types)
- Existing documentation may need remediation
Strategy 2: Data Masking in Screenshots
Before posting screenshot sa Confluence, mask sensitive fields:
Before:
Customer: John Doe
Email: john@example.com
Phone: 555-1234
After:
Customer: [NAME]
Email: [EMAIL]
Phone: [PHONE]
Tools:
- Screenshot tools na may built-in annotation (Snagit, Lightshot)
- Browser extensions na nag-mask PII bago screenshot
- Post-processing scripts (ImageMagick, Python PIL)
Example workflow:
# Take screenshot
screenshot.png
# Post-process to mask PII
python mask_pii_in_image.py screenshot.png → screenshot_masked.png
# Upload masked version sa Confluence
Benefits:
- Screenshots are safe (PII masked)
- Still visually helpful (layout visible)
Challenges:
- Manual effort (need to mask bago upload)
- Easy to forget (habit of posting unmasked)
- Masking ay imperfect (may miss some PII)
Strategy 3: Access Control + Retention Policy
Restrict access sa pages containing PII:
confluence_pii_access_control:
pages_with_real_data:
- Access: Engineering team only (10 people)
- Sharing: Requires manager approval
- Retention: 30 days, then auto-delete
pages_with_masked_data:
- Access: Engineering + Product + Support (~80 people)
- Retention: No auto-delete (documents remain useful)
pages_with_fictional_data:
- Access: Open (anyone sa company)
- Retention: Indefinite (safest option)
Implementation:
Confluence permissions:
Page: "Refund Processing with Real Data"
Restrictions:
- Only group: "billing-engineers"
- Edit: Only 2 people (manager + specialist)
- View history: Disabled (prevent viewing old versions)
- Comments: Disabled (prevent discussion)
- Export: Disabled (prevent copy-out)
Retention automation:
# Confluence API script
for page in get_pages_with_tag('contains-real-pii'):
created_date = page.created
age_days = (today - created_date).days
if age_days > 30:
page.archive() # Or delete
log_audit_event(f"Archived {page.title} after 30 days")
Benefits:
- Access restricted (only need-to-know)
- Automatic retention (no manual cleanup)
- Audit trail (who accessed, when)
Challenges:
- Operational overhead (manage access control)
- Role creep (people accumulate access)
- Complexity (maintaining permissions across teams)
Strategy 4: Separate Spaces (Public vs. Sensitive)
Create two wiki spaces:
Confluence space 1: "Engineering Documentation (Public)"
- Content: How-tos, architecture, best practices
- NO customer data, NO real examples
- Access: Anyone sa company (including contractors, temps)
- Retention: Indefinite
Confluence space 2: "Sensitive Documentation (Restricted)"
- Content: Real customer examples, troubleshooting na may data
- Customer data ay masked o fictional
- Access: Engineering + support teams only (50 people)
- Retention: 30 days auto-delete
Benefits:
- Clear separation (public = safe)
- Default deny (sensitive space ay restricted by default)
- Easy governance (two buckets)
Challenges:
- Requires discipline (teams may put real data sa public space anyway)
- Documentation duplication (may need versions para sa both spaces)
GDPR Compliance: Confluence Policy
confluence_pii_policy:
creation:
- Before writing documentation: Assess if real data ay needed
- If no: Use fictional / masked data
- If yes: Document business justification
real_data_handling:
- Pages containing real customer data ay tagged [CONTAINS_PII]
- Tags ay automatically restricts access sa team only
- Manager approval required bago posting
- Screenshot template ay provided (pre-masked version)
- Retention: 30 days max
fictional_data_handling:
- Demo account ay provided (docs+demo@company.com)
- Use demo account sa all new documentation
- Old pages ay migrated sa demo data over time
- No access control needed (fictional = safe)
access_control:
- [CONTAINS_PII] pages: Engineering team only
- Regular pages: Open access
- Audit: Confluence admin logs all access
retention:
- Pages tagged [CONTAINS_PII]: Auto-delete after 30 days
- Regular pages: Indefinite
- Version history: Keep, but restrict access per parent page
audit:
- Monthly compliance report: How many [CONTAINS_PII] pages exist?
- If >5: Investigation required
- Quarterly: Sample pages, verify no unmasked PII
Testing: Confluence PII Audit
Before compliance review:
def audit_confluence_for_pii():
all_pages = confluence.get_all_pages()
for page in all_pages:
# Check for email patterns
emails = re.findall(r'\S+@\S+\.\S+', page.content)
if emails:
print(f"WARNING: {page.title} may contain emails: {emails[:3]}")
# Check for phone patterns
phones = re.findall(r'\d{3}[-.\s]?\d{3}[-.\s]?\d{4}', page.content)
if phones:
print(f"WARNING: {page.title} may contain phones: {phones[:3]}")
# Check for SSN patterns
ssns = re.findall(r'\d{3}-\d{2}-\d{4}', page.content)
if ssns:
print(f"CRITICAL: {page.title} contains SSN patterns!")
# Check for access control
if 'pii' in page.title.lower() or emails or phones:
if not is_restricted(page):
print(f"CRITICAL: {page.title} contains PII but ay publicly accessible!")
Remediation Template
For pages na may unmasked PII:
1. Screenshot original page (archive)
2. Delete sensitive content o mask it
3. Re-post masked version
4. Audit trail: Document what was removed + why
5. Communication: Notify team na page was updated
6. Retention: Schedule deletion sa 30 days
Conclusion
Confluence + PII ay requires proactive management. Best approach ay:
- Prefer fictional data (demo accounts para sa documentation)
- Mask real data (bago posting, use masking tools)
- Restrict access (engineering team only para sa sensitive pages)
- Auto-delete (30-day retention para sa real data pages)
- Audit regularly (quarterly PII scan ng wiki)
- Training (remind teams: use demo data, not real customers)
This ay balances documentation value (engineering teams need examples) with privacy (GDPR compliance).