How-To

Five Redaction Mistakes That Expose Personal Data

Public bodies and law firms have been caught out by inadequate redaction - hidden text behind black boxes, overlay failures that anyone can reverse with a free PDF tool, metadata surviving the process entirely. These are the most common errors, illustrated by named cases.

By RedactProof Editorial Team · 18 Feb 2026 · 9 min read

Five Redaction Mistakes That Expose Personal Data

This article is for general informational purposes only and does not constitute legal advice. Regulatory requirements vary by jurisdiction and change over time. Consult a qualified legal professional for advice specific to your organisation's circumstances.

Overlay redaction instead of pixel-burn

This is the mistake that makes headlines. Someone draws a black rectangle over sensitive text in a PDF, exports the file, and sends it out. The text underneath has not gone anywhere. Select all, copy, paste into a text editor - there it is.

In January 2019, lawyers representing Paul Manafort filed a court document in the Mueller investigation using exactly this method. The filing contained redaction bars drawn over text without removing the underlying data. Within hours, journalists discovered that copying and pasting the blacked-out sections revealed their full contents - including that Manafort had shared Trump campaign polling data with his Russian-linked associate Konstantin Kilimnik, and that the two had met in Madrid. The filing was pulled from public access and replaced. The damage, in terms of what was now publicly known, was already done.

The distinction matters because most general-purpose PDF tools - including annotation features in Adobe Reader, Preview on Mac, and various free online editors - only add a visual layer. The text data remains in the file structure. The ICO's guidance on disclosing documents securely warns about this risk explicitly. Genuine redaction requires a tool that replaces the underlying text with image data or removes it from the file entirely - pixel-burn redaction. A visual overlay is not redaction. It is concealment, and concealment fails.

How to check: open your redacted PDF, try to select and copy text from behind the redaction marks. If readable text transfers to the clipboard, the redaction is cosmetic. Pixel-burn redaction either produces nothing or image data.

Forgetting document metadata

You've carefully redacted every name from a 40-page report. The document properties still show the original author's full name, email address, and company name in the Created by field. The revision history contains earlier versions with unredacted content. Embedded comments from reviewers include names and timestamps.

In February 2021, the Canada Border Services Agency (CBSA) and Immigration, Refugees and Citizenship Canada (IRCC) found themselves in this position during Federal Court proceedings. According to an affidavit filed in the case and reported by CBC News, CBSA staff had used a software tool to highlight sensitive text and change the highlight colour from yellow to black - the document equivalent of painting over text in Word. When the documents were released to an applicant's lawyer, the supposedly redacted text could simply be lifted off. The agency's own court filing stated it was "unaware that the blacked-out text could be lifted to reveal confidential and sensitive information." Four additional pieces of sensitive content were then found to have been missed entirely.

Metadata is the information about a document that most people never look at. A thorough redaction process strips author fields, revision history, comments, annotations, bookmarks, and any embedded attachments. Most dedicated redaction tools handle this automatically. If yours does not, clean the metadata separately before disclosure.

Inconsistent name redaction

A witness is named "Sarah Thompson" on page 3 and the redactor catches it. On page 17, a reference to "Ms Thompson" slips through. On page 41, "S. Thompson" appears in a table footer. Same person, three different formats, only one caught.

Automated detection helps here - AI-based tools that understand entity context can link variations of the same name across a document. Even with automation, a final manual pass is sensible for anything going out externally. Search for fragments of known names: initials, titles alone, surnames without forenames. All of these can identify a person.

The same applies to organisations. A firm redacting references to a client company might catch the full trading name but miss an abbreviated version used in email threads, or a registration number that points to the same entity. Context-aware detection is the difference between a thorough review and a lucky one.

RedactProof detects 40+ types of personal information - names, partial names, NI numbers, dates of birth, account references, and more - and links entity variations across documents automatically.

Ignoring embedded content

PDFs can contain embedded images, attached files, form field data, and occasionally JavaScript. A scanned letter embedded as an image inside a PDF will not be caught by text-based redaction - OCR needs to run first to make that text searchable and redactable.

PDF form fields store their values separately from the visible text layer. Redacting what you see on the page does not remove the data held in the underlying form field object. Attached files embedded within a PDF - contracts with annexes, emails with attachments - need individual review at every nesting level.

We've seen firms send disclosure bundles where the cover sheet is properly redacted but an embedded email attachment three levels deep still contains personal data. The submitted file passes a surface review. The embedded attachment does not.

Assuming one review is enough

Redaction is quality assurance work. One pass catches most things. A second pass - ideally by someone who was not the first reviewer - catches what the first missed. This is not about competence. Concentration dips by page 60 of a 120-page document. Names in footers become invisible. Dates in table cells blur together.

When a second human reviewer is not available, running the document through automated detection after manual review acts as that second check. The tool will not catch everything human judgement catches, but it picks up different things than a tired reviewer does.

There is a less obvious dimension to this mistake: the assumption that a document, once redacted, stays redacted. If a document is edited after redaction - a correction, an appended exhibit, a version update - the redaction needs to be re-applied to the modified version. A document that was compliant at export may not be compliant after modification.

Releasing documents without verifying the output file

In 2025, a memorandum of understanding between the US Office of Personnel Management (OPM) and agencies including the Social Security Administration - relating to DOGE staffers accessing federal systems - was filed in federal court proceedings with redactions that failed to conceal the underlying text. According to court documents and reporting by NPR, the document had been improperly redacted, allowing hidden content to be read. The incident exposed details about the scope of access that DOGE personnel had been granted to sensitive databases containing personal information on millions of Americans.

The pattern across the cases on this page is consistent. A document is prepared under time pressure. A visual redaction method is applied. Nobody verifies the output before release. The Manafort filing was pulled within hours of journalists discovering the failure. The CBSA documents were caught because the receiving lawyer recognised what had happened. The OPM filing became evidence in ongoing litigation. In each case, the redaction method failed - but so did the verification step that should have caught it.

Verification should be the last step before any disclosure - separate from the redaction process, testing the actual output file as if you were the recipient.

How to verify your redaction is genuine

These checks should run on the output file before it leaves your hands. Not on the version open in your redaction tool - on the exported file, in a separate application.

1. Copy test. Select text from behind every redaction mark and try to copy it. If readable text transfers to the clipboard, the redaction is overlay only. Run this in a viewer different from the one you used to create the redactions.

2. Raw file inspection. Open the PDF in a plain text editor. Search for strings you expect to be redacted. If they appear in the raw content, they have not been removed - only hidden. This is how the Manafort filing failure was confirmed.

3. Metadata audit. Check document properties for author name, company, revision history, and embedded comments. In most PDF viewers, this is under File > Properties. Any field containing personal data that should be removed needs cleaning before disclosure.

4. Embedded content check. If the document contains images, confirm OCR has run and the text layer in those images has been processed. Check for embedded file attachments. Form fields should be flattened or their values cleared before export.

5. Verification certificate. If your redaction tool supports tamper-evident certificates, the certificate records the cryptographic state of the document at the point of export. Any subsequent modification - including re-adding text that was redacted - produces a detectable mismatch. This is an auditable record of what the document contained at disclosure, not a substitute for the checks above.

For a detailed walkthrough of what to look for in a redacted output file, see our guide to verifying redacted document integrity. For how tamper-evident certificates work technically, see verification certificates. For a practical step-by-step on the redaction process itself, how to redact a PDF covers the full workflow.

Frequently Asked Questions

How can I tell if my PDF has been genuinely redacted?

Open the document in a PDF viewer and try to select and copy the text behind any redaction marks. If readable text transfers to your clipboard, the redaction is overlay only - the data is still in the file. Genuine pixel-burn redaction removes or replaces the underlying text data, so copying returns nothing or image data. You can also open the PDF in a plain text editor and search for strings you expect to be redacted. If they appear in the raw file, they have not been removed. The Manafort court filing in January 2019 is the best-known example of this failure.

Does printing and re-scanning count as redaction?

It removes digital text data, but creates other problems. The result is a flat image PDF, so text is no longer searchable, file size increases, and screen reader accessibility is lost. Document quality degrades with each print-scan cycle. Dedicated redaction tools achieve the same data destruction while preserving document usability. The print-scan method also does nothing for metadata or embedded file attachments - those survive the process.

What personal data is most commonly missed during redaction?

From what we've seen, the most frequently missed categories are: partial name references (initials, surname only, nicknames), data in headers and footers that repeat across pages, information inside tables and form fields, metadata fields (author name, revision history, comments), and content within embedded images that has not been OCR processed. Automated detection tools scan for 40+ types of personal information, which helps catch the categories that manual review tends to miss.

Can redacted text in a PDF be recovered or reversed?

With overlay redaction, yes - the text has not been removed, only visually covered. Anyone can copy the text behind the black box into a text editor. With pixel-burn redaction, no - the underlying text data has been destroyed and there is no recovery path. This is why the method of redaction matters, not just whether something looks blacked out. If you are unsure which method your tool uses, run the copy test: try to copy text from behind a redaction mark. Readable text confirms it is overlay only.

What is a verification certificate and does it prove redaction was done correctly?

A verification certificate is a cryptographic record of the document's state at the point of export. It contains a hash of the redacted file - if the document is modified after export, the hash no longer matches and any verification attempt will fail. It does not confirm that all sensitive data was found and redacted (that is a judgement call about what should be removed). It confirms that the document has not been altered since the certificate was issued. For disclosure contexts, it provides an auditable record that can be checked by a receiving party.

Related Guides

How-To

Try it yourself

Put this into practice with RedactProof. Free account, no installation needed.

Launch App