Back to BlogLegal Tech

After the Epstein Files: Why Black-Box Highlighting Is Never True Redaction

The December 2025 DOJ Epstein files release exposed a critical redaction failure: black-highlighted PDF text remains readable via copy-paste. With 71% of legal teams using AI tools, understanding what real redaction means has never been more urgent.

March 5, 20267 min read
document redactionPDF redaction failurelegal complianceWord redaction

The December 2025 Redaction Failure

When the US Department of Justice released the Epstein files in December 2025, the coverage quickly shifted from the documents' content to their redactions — and specifically to how easily those redactions could be bypassed.

The mechanism was straightforward: text "redacted" using black highlighting in PDF files remains present in the PDF's text layer. Copy the black box into a text editor, and the underlying text appears. The visual concealment was not text removal. The sensitive information was never deleted.

This was not a novel vulnerability. The 2007 Anthony Pellicano case involved sensitive information revealed through improper redaction in legal documents. The same failure mode had appeared in court filings, government reports, and corporate document productions throughout the intervening years. Yet the Epstein files, due to their profile, made the failure visible to tens of millions of people who watched the story unfold in real time.

Visual Concealment vs. True Redaction

Understanding why this keeps happening requires understanding the technical distinction between concealment and deletion.

Visual concealment places a visual element over text without removing the text from the file structure. Methods that fall into this category:

  • Black text highlighting (sets text background to black)
  • White text on white background (changes text color to match background)
  • Drawing a black rectangle shape over text
  • PDF annotation covering (adds an opaque annotation element)
  • Image overlay (places a black image on top of text)

In every case above, the original text remains in the file. It can be recovered by copying the concealed region, removing the overlay element, or examining the raw file structure.

True redaction removes the underlying text from the file permanently. The text is not hidden — it is gone. Nothing remains to recover.

The critical question for any document that leaves your control is: when someone with technical knowledge examines this file, will they find the original text? With visual concealment, the answer is yes.

The Word Document Problem

The same failure mode exists in Microsoft Word. Using black text highlighting, white text color, or opaque text boxes to "redact" a Word document leaves the original text intact in the document's XML structure.

This matters because Word documents are the primary format for legal correspondence, contracts, witness statements, HR files, and internal investigations. Organizations that have been redacting Word documents using highlighting have been producing documents with recoverable content throughout those documents' distribution history.

71% of legal teams use generative AI tools despite data residency concerns (ACC 2025). As AI tools become part of document workflows, the risk of discovering past redaction failures increases — AI tools that process documents may surface text from "redacted" sections that weren't actually deleted.

High-Profile Examples of Redaction Failures

The Epstein files were not the first high-profile instance of this failure mode.

The Anthony Pellicano case (2007) involved sensitive information revealed through improperly redacted legal documents filed in federal court.

NSA documents released through FOIA requests have repeatedly been found to contain readable text under black boxes due to PDF redaction failures — a problem documented by security researchers and journalists analyzing national security document releases.

Corporate litigation filings routinely contain inadvertently readable redacted content when filing parties use PDF comment or annotation layers rather than true content deletion.

The consistency of this failure pattern reflects a fundamental gap between how legal professionals conceptualize redaction (as a visual act) and how PDF and Word document formats actually work (as structured data containing text regardless of visual presentation).

What True Redaction Requires

For a document to be truly redacted — such that a technically capable recipient cannot recover the original content — the underlying text must be removed from the file structure and replaced.

In PDF documents, true redaction requires:

  • Flattening the PDF to remove all editable layers
  • Replacing text content with black rectangles or redaction markers at the content stream level
  • Removing metadata that may contain the original text
  • Removing embedded fonts that could enable text reconstruction

In Word documents, true redaction requires:

  • Finding every instance of the text to be removed (including in tracked changes, comments, revision history, metadata, and embedded objects)
  • Replacing the text content, not overlaying it visually
  • Preserving document formatting without leaving artifacts that indicate what was removed

The key word is replacement: the original text must be replaced with something else, not concealed beneath something else.

The Headers, Footers, and Comments Problem

Legal document redaction has additional complexity beyond the main text body. Sensitive information appears in locations that visual redaction tools often miss entirely:

Headers and footers frequently contain matter names, client identifiers, confidential designations, and document control numbers. Black-highlighting the body of a contract while leaving "Privileged and Confidential — Re: TechCorp/MegaStartup Acquisition" in the header defeats the purpose of the exercise.

Comments and tracked changes are a consistent source of inadvertent disclosure. A reviewer who comments "see John Smith's testimony about this clause" leaves that comment in the document even after the clause itself is "redacted."

Document properties and metadata contain author names, company names, revision history, and summary information that can identify the document's origin even when the content is redacted.

Revision history in Word documents preserves previous versions of edited text. A document that said "the plaintiff's home address is 123 Main Street" and was then edited to "the plaintiff's address" retains the original version in the revision history unless that history is explicitly cleared.

Building a Compliant Redaction Process

Given the failure modes above, a compliant redaction process requires:

1. Use native Word integration for Word documents: Redaction that works within the Word document object model — replacing text content directly in the document structure — is the only approach that avoids the concealment-vs-deletion problem. Converting to PDF and redacting the PDF introduces format transformation risk and may not properly handle comments, tracked changes, or revision history.

2. Process all document zones: Any compliant redaction process must include explicit processing of headers, footers, footnotes, endnotes, comments, tracked changes, and document properties — not just the main body text.

3. Verify the output: After redaction, verify the result by attempting to recover the redacted content. Copy-paste the redacted areas. Open the document's XML structure. Check tracked changes and revision history. If original content appears anywhere, the redaction is incomplete.

4. Maintain an audit trail: For legal productions, document what was redacted, by what method, and by whom. This becomes relevant if a privilege dispute or redaction challenge arises.

The Epstein Files as a Teaching Moment

The Epstein files redaction failure was embarrassing for the DOJ, but it provides a concrete, publicly visible demonstration of exactly what happens when visual concealment is confused with true redaction.

Every legal team, government agency, and compliance professional that watched the story unfold should ask: what is in our organization's past document productions that could be similarly recovered? What is our current redaction process, and does it actually delete text or merely conceal it?

The answers to those questions determine actual exposure, not the existence of a redaction policy.


anonym.legal's Office Add-in performs true PII replacement within Word documents — replacing text content directly in the document structure, not overlaying it visually. Headers, footers, footnotes, comments, and tracked changes are processed. The result is a document from which the original text is absent, not hidden.

Sources:

Ready to protect your data?

Start anonymizing PII with 285+ entity types across 48 languages.