Five Redaction Mistakes That Expose Personal Data
A Freedom of Information response from a UK council in 2023 accidentally exposed the personal details of vulnerable individuals. The cause was overlay redaction - black boxes placed on top of text that anyone could lift off with a free PDF tool. It's one of the most common redaction failures, and far from the only one.
By RedactProof Editorial Team Β· Feb 18, 2026
Overlay redaction instead of pixel-burn
This is the mistake that makes headlines. Someone draws a black rectangle over sensitive text in a PDF, exports the file, and sends it out. The text underneath hasn't gone anywhere. Select all, copy, paste into Notepad - there it is.
The distinction matters because most general-purpose PDF tools (including the default annotation features in Adobe Reader, Preview on Mac, and various free online editors) only add a visual layer. The text data remains in the file structure. Genuine redaction requires a tool that replaces the underlying text with image data or removes it from the file entirely. This is called pixel-burn redaction.
How to check: open your redacted PDF in any viewer, try to select and copy text from behind the redaction marks. If you can, the redaction is cosmetic. If you get nothing - or image data - it's genuine.
Forgetting document metadata
You've carefully redacted every name from a 40-page report. But the document properties still show the original author's full name, email address, and the company name in the "Created by" field. The revision history might contain earlier versions with unredacted content. Embedded comments from reviewers can include names and timestamps.
Metadata is the information about the document that most people never look at. A thorough redaction process strips this as well: author fields, revision history, comments, annotations, bookmarks, and any embedded attachments. Most dedicated redaction tools handle this automatically. If yours doesn't, you'll need to clean metadata separately.
Inconsistent name redaction
A witness is named "Sarah Thompson" on page 3 and the redactor catches it. On page 17, a reference to "Ms Thompson" slips through. On page 41, "S. Thompson" appears in a table footer. Same person, three different formats, only one caught.
Automated detection helps here - AI-based tools that understand entity context can link variations of the same name across a document. But even with automation, a final manual pass is sensible for anything being disclosed externally. Search the document for fragments of known names, not just the full version.
Ignoring embedded content
PDFs can contain embedded images, attached files, form field data, and even JavaScript. A scanned letter embedded as an image within a PDF won't be caught by text-based redaction. OCR (optical character recognition) needs to run first to make the text in those images searchable and redactable.
Similarly, PDF form fields store their values separately from the visible text. Redacting what you see on the page doesn't necessarily remove the data stored in the form field object. And attached files embedded within a PDF (contracts with annexes, emails with attachments) need individual review.
We've seen firms send disclosure bundles where the cover sheet is properly redacted but an embedded email attachment three levels deep still contains personal data.
Assuming one review is enough
Redaction is quality assurance work. One pass catches most things. A second pass - ideally by a different person - catches what the first missed. This isn't about competence. It's about attention fatigue. By page 60 of a 120-page document, concentration dips. Names in footers become invisible. Dates in table cells blur together.
If a second human reviewer isn't available, running the document through automated detection after your manual review acts as that second check. The tool won't catch everything, but it catches different things than a tired human does.
Frequently Asked Questions
How can I tell if my PDF has been genuinely redacted?
Open the document in a PDF viewer, then try to select and copy the text behind any redaction marks. If you can copy readable text, the redaction is overlay only - the data is still in the file. Genuine pixel-burn redaction removes or replaces the underlying text data entirely. You can also open the PDF in a text editor and search for strings you expect to be redacted. If they appear in the raw file content, they haven't been removed.
Does printing and re-scanning count as redaction?
It removes digital text data, but it's crude and creates other problems. The resulting PDF is a flat image, so file size increases dramatically and text is no longer searchable or accessible for screen readers. You also lose document quality with each print-scan cycle. Dedicated redaction tools achieve the same data destruction while preserving document quality and accessibility.
What personal data is most commonly missed during redaction?
From what we've seen, the most frequently missed categories are: partial name references (initials, surname only, nicknames), data embedded in headers and footers that repeat across pages, information inside tables and form fields, metadata fields (author name, revision history, comments), and content within embedded images that hasn't been OCR'd. Automated PII detection tools like RedactProof scan for 40+ types of personal information, which helps catch categories that manual review tends to miss.
Try it yourself
Put this into practice with RedactProof. Free account, no installation needed.