Compliance

Pseudonymisation, Anonymisation, and Redaction: Which One Do You Actually Need?

A data protection officer at a mid-size insurer is asked three different questions in the same week. The marketing team wants to use customer data for analytics without consent. The legal team needs to disclose claim files in litigation. And a researcher at a partner university wants access to historical policy data for an actuarial study. Each scenario calls for a different technique - pseudonymisation, anonymisation, or redaction - and choosing the wrong one creates either a compliance gap or unnecessary data loss.

By RedactProof Editorial Team · 25 Feb 2026 · 8 min read

Pseudonymisation, Anonymisation, and Redaction: Which One Do You Actually Need?

Three techniques, three legal outcomes

The confusion between these three terms is not just semantic. Under GDPR - and by extension the UK GDPR and Data Protection Act 2018 - each technique produces a different legal result. That legal result determines what you can do with the data afterward, what obligations still apply, and what happens if something goes wrong.

Pseudonymisation replaces direct identifiers with artificial ones - a reference code, a token, a hash - while keeping a separate key that allows re-identification. The data is still personal data under GDPR. All data protection obligations continue to apply. Article 4(5) defines it explicitly: processing personal data so it can no longer be attributed to a specific individual without additional information, provided that additional information is kept separately.

Anonymisation removes or transforms data so that no individual can be identified, even by the data controller, even with additional information. Truly anonymised data falls outside GDPR entirely - Recital 26 confirms this. It is no longer personal data. No data protection obligations apply.

Redaction permanently removes specific information from a document before disclosure. It is a practical technique rather than a legal concept defined in GDPR. Depending on what you redact and how thoroughly, the result might be pseudonymised data, anonymised data, or something in between. Redaction is the mechanism; pseudonymisation and anonymisation describe the outcome.

Pseudonymisation: still personal data, still regulated

The European Data Protection Board published Guidelines 01/2025 on Pseudonymisation in January 2025, providing the most detailed regulatory guidance on this topic to date. The central message is unambiguous: pseudonymised data remains personal data. Organisations cannot use pseudonymisation to sidestep GDPR requirements.

What pseudonymisation does offer is a recognised security measure under Article 32 and a factor that can reduce risk assessments under Article 35 (Data Protection Impact Assessments). It can also serve as a safeguard that enables certain processing activities that might otherwise be difficult to justify - particularly further processing under Article 6(4) and research processing under Article 89.

A hospital replacing patient names with reference codes in a research database is pseudonymising. The ward staff can still link the codes back to patients using the hospital's records system. The data is protected against casual exposure - a researcher who sees "Patient R-4471" cannot identify the individual from that alone - but the link back to the real identity exists and is maintained.

The EDPB guidelines stress that the strength of pseudonymisation depends on the controls around the re-identification key. If the key is stored on the same system as the pseudonymised data, the protection is minimal. If it is held by a separate controller with robust access controls, the protection is meaningful. The technique is only as strong as the governance around it.

Anonymisation: outside GDPR, but harder than you think

True anonymisation is the gold standard for data that needs to be freely shared, published, or used without restriction. Once data is genuinely anonymised, GDPR does not apply. No consent required. No data subject rights. No breach notification obligations.

The difficulty is achieving it. Recital 26 sets the test: you must consider "all the means reasonably likely to be used" to identify an individual, taking into account "objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology." This is not a static assessment. As data linkage techniques improve and more datasets become publicly available, information that was anonymous in 2020 may not be anonymous in 2026.

Research has repeatedly demonstrated this fragility. A 2019 study showed that 99.98% of Americans could be re-identified from just 15 demographic attributes in supposedly anonymised datasets. Netflix famously had its "anonymised" viewing data de-anonymised by researchers who cross-referenced it with public IMDb reviews. The UK's NHS had to withdraw a dataset after researchers demonstrated re-identification from hospital episode statistics.

For document redaction specifically, achieving true anonymisation means removing enough information that no combination of remaining details could identify anyone - even when cross-referenced against other available data. In many cases, this requires removing so much that the document loses its utility. A medical case study stripped of all dates, locations, ages, conditions, and treatment details is no longer a useful case study.

Redaction: the practical mechanism

Redaction sits at a different level of abstraction. Where pseudonymisation and anonymisation describe outcomes - "this data can/cannot be linked back to individuals" - redaction describes an action: permanently removing specific content from a document.

You might redact a document to pseudonymise it (removing names but leaving a case reference number that your internal systems can resolve). You might redact a document to anonymise it (removing every piece of identifying information so thoroughly that re-identification becomes impossible). Or you might redact a document for a purpose that does not fit neatly into either category - withholding legally privileged material from a disclosure, for instance, or removing commercially sensitive pricing from a contract before sharing it with a subcontractor.

The technique matters as much as the intent. Overlay redaction - placing a black box over text in a PDF - looks like redaction but often leaves the underlying text intact and recoverable. Pixel-burn redaction permanently destroys the underlying data by rendering the page as an image and replacing redacted areas with solid fills. For any redaction intended to achieve pseudonymisation or anonymisation, the irreversibility of the method is not optional.

Choosing the right technique for your scenario

The choice depends on two questions: what do you need to do with the data afterward, and does anyone need to link it back to specific individuals?

Use pseudonymisation when you need to reduce risk while retaining the ability to re-identify. Internal analytics where the data science team works with coded data but the business needs to act on findings at an individual level. Clinical trials where researchers work with participant codes but the study coordinator must be able to contact participants. Employee satisfaction surveys where HR wants aggregate trends but needs to follow up on specific safeguarding concerns.

Use anonymisation when the data will be published, shared externally without restriction, or used for a purpose where individual identity is irrelevant and re-identification would serve no legitimate purpose. Open research datasets. Public statistics. Training data for machine learning models (though even here, care is needed - models can memorise and reproduce training data).

Use redaction when you are disclosing specific documents and need to remove particular information before release. Subject Access Requests where third-party data must be removed. FOI disclosures where exempted material must be severed. Legal discovery where privileged content must be withheld. Employment tribunal bundles where irrelevant personal data about non-parties must be stripped. Redaction is the workhorse technique for document-level disclosure.

Where organisations get this wrong

Calling something anonymised when it is actually pseudonymised. If your organisation holds the key that links the processed data back to individuals - even if that key is in a different system, even if access is restricted - the data is pseudonymised, not anonymised. GDPR still applies in full. Claiming the data is anonymised to avoid compliance obligations is a finding waiting to happen.

Using overlay redaction and assuming the data is removed. A black box drawn over text in a PDF editor can often be lifted by selecting the text underneath, copying the PDF content, or using a PDF reader's accessibility features. If the redaction method does not permanently destroy the underlying data, you have not achieved redaction at all - you have achieved the visual appearance of redaction.

Treating pseudonymisation as a one-time event rather than an ongoing control. The EDPB guidelines emphasise that pseudonymisation requires continuous governance - access controls on the re-identification key, regular review of who can access it, and monitoring for changes in the risk environment. An employee who leaves the organisation but retains knowledge of the pseudonymisation scheme represents a control gap.

RedactProof handles the redaction layer of this equation. It identifies 40+ types of personal information automatically, applies pixel-burn redaction that permanently destroys the underlying data, and generates tamper-evident verification certificates that provide an auditable record of what was redacted and when. Whether the outcome of that redaction is pseudonymisation or anonymisation depends on the scope of what you remove - but the permanence of the removal itself is handled by the tool.

Disclaimer: This guide is for informational purposes only and does not constitute legal, medical, or professional advice. Consult a qualified professional for advice specific to your situation.

Frequently Asked Questions

Is pseudonymised data still personal data under GDPR?

Yes. Pseudonymised data remains personal data under GDPR. Article 4(5) defines pseudonymisation as a processing technique, not an exemption from the regulation. All data protection obligations - including lawful basis requirements, data subject rights, breach notification, and security measures - continue to apply to pseudonymised data. The EDPB confirmed this position in its January 2025 Guidelines on Pseudonymisation.

How do I know if my data is truly anonymised?

The test under Recital 26 of GDPR is whether the data can be linked back to an individual using "all the means reasonably likely to be used." This includes considering available technology, the cost and time required for identification, and other datasets that could be cross-referenced. If any realistic path to re-identification exists - even one requiring effort - the data is not truly anonymised. Many organisations overestimate their anonymisation and underestimate re-identification risk.

Can redaction achieve anonymisation?

It can, but it depends entirely on what you redact and how much context remains. Redacting a name from a document that contains no other identifying details might achieve anonymisation. Redacting a name from a document that still contains a date of birth, postcode, and job title probably does not - those details in combination could identify someone through cross-referencing. True anonymisation through redaction requires removing enough information that re-identification is not reasonably possible, which often means removing more than organisations expect.

What did the EDPB 2025 pseudonymisation guidelines change?

The EDPB Guidelines 01/2025 did not change the law but provided the most detailed regulatory interpretation to date. Key clarifications include: pseudonymised data is always personal data regardless of how strong the pseudonymisation technique is; the effectiveness depends on controls around the re-identification key, not just the technique itself; pseudonymisation can serve as a safeguard for further processing under Article 6(4); and organisations should assess pseudonymisation strength based on the specific context and risks rather than relying on any single technique as inherently sufficient.

Redact with confidence

RedactProof detects PII across your documents without uploading them. Start with a free account.

Launch App