Compliance

The HIPAA Safe Harbor Checklist: All 18 Identifiers Explained

A compliance officer at a mid-size hospital network gets handed 200 patient records that need sharing with a research partner. The directive sounds simple: de-identify them under HIPAA. But "de-identify" covers two distinct methods with very different requirements - and picking the wrong approach, or applying the right one incompletely, creates liability that sits with the organisation. This guide breaks down the Safe Harbor method identifier by identifier.

By RedactProof Editorial Team · Feb 25, 2026

The HIPAA Safe Harbor Checklist: All 18 Identifiers Explained

Two methods, one goal

HIPAA’s Privacy Rule (45 CFR 164.514) recognises two paths to de-identification. Both aim for the same outcome - health information that can no longer identify an individual - but they get there differently.

Safe Harbor is the prescriptive route. Remove 18 specific categories of identifier, confirm you have no actual knowledge that the remaining data could identify someone, and the information is considered de-identified. No statistical analysis required.

Expert Determination takes the opposite approach. A qualified statistical or scientific expert applies accepted methods to certify that the risk of identifying any individual is "very small." The expert documents their methodology and results. More flexible - you might retain data points that Safe Harbor would strip - but it demands specialist expertise and formal certification.

Most organisations handling routine redaction work choose Safe Harbor. It’s concrete, and you can build a repeatable checklist around it. The sections below walk through each of the 18 categories.

The 18 Safe Harbor identifiers

Under 45 CFR 164.514(b)(2), these categories must be removed or generalised. The regulation refers to identifiers "of the individual or of relatives, employers, or household members of the individual."

That last part catches people out. Stripping the patient’s name isn’t enough. A spouse’s name in the emergency contact field, an employer mentioned in an occupational health note, a relative’s phone number listed for afterhours contact - all within scope.

1. Names

All names associated with the individual. Patient names, obviously. But also relatives, employers, and household members appearing anywhere in the record. Nicknames, maiden names, and aliases count.

2. Geographic data smaller than a state

Street addresses, city names, county names, postcodes, and equivalent geographic codes. You can retain the first three digits of a US zip code provided the geographic unit containing all zip codes sharing those three digits has more than 20,000 people. Where that condition isn’t met, replace with "000."

Most organisations strip full addresses and keep only the state. The three-digit exception adds complexity that rarely justifies itself outside large research datasets where geographic granularity matters.

3. Dates

All date elements directly related to an individual, except year. Birth dates, admission dates, discharge dates, procedure dates, referral dates, dates of death. For individuals aged 89 or older, the year must also be aggregated into a single "90 or older" category.

Dates are the identifier most frequently missed during redaction. A discharge summary can contain a dozen different dates woven through narrative text. The year-only exception and the over-89 rule both need human review that automated tools alone won’t resolve.

4. Telephone numbers

All telephone numbers - landline, mobile, work, pager. Numbers belonging to the individual and to relatives or employers appearing in the record.

5. Fax numbers

Separated from telephone numbers in the regulation. Healthcare still uses fax more than most industries, so these crop up regularly in referral letters and inter-provider correspondence.

6. Email addresses

Personal and work email addresses. These increasingly appear in records as patient portal correspondence gets printed or exported into clinical charts.

7. Social Security numbers

Full or partial SSNs. Some records contain only the last four digits. Those still count as identifiers under Safe Harbor.

8. Medical record numbers

The organisation’s internal patient identifier. MRNs appear in headers, footers, cross-references, lab result labels, and page margins. Easy to catch in the demographics section. Easy to miss on page 40 of a discharge bundle.

9. Health plan beneficiary numbers

Insurance member IDs, Medicaid and Medicare beneficiary numbers, and similar plan identifiers. These commonly appear on insurance authorisation forms, explanation of benefits documents, and billing records filed alongside clinical notes.

10. Account numbers

Bank accounts, billing accounts, and other financial account numbers. Less common in purely clinical records but frequent in billing-related documentation.

11. Certificate and licence numbers

Driver’s licence numbers, professional licence numbers, and similar identifiers. They occasionally surface in identification verification documents stored in the record.

12. Vehicle identifiers and serial numbers

Vehicle identification numbers (VINs) and registration plates. Relevant in accident-related records, emergency department reports, and occupational health documentation involving vehicle incidents.

13. Device identifiers and serial numbers

Serial numbers for medical devices - implants, insulin pumps, pacemakers, prosthetics. Unique Device Identifiers (UDIs) mandated by the FDA appear in surgical records and device registries. These are genuinely identifying because they map back to a specific patient through the manufacturer’s records.

14. Web URLs

URLs that could identify an individual. Patient portal links, personal websites listed in intake forms, URLs in correspondence. Less common in traditional records but growing with telehealth adoption.

15. IP addresses

Internet protocol addresses from telehealth sessions, patient portal access logs, or electronic communications stored in the record.

16. Biometric identifiers

Fingerprints, voiceprints, retinal scans. These appear in identity verification systems at some facilities and in certain specialist records - ophthalmology, for instance.

17. Full-face photographs and comparable images

Photographs where the individual is identifiable. Clinical photographs of a facial condition, ID photos in the record, any image showing enough of the face or body to identify someone. A photograph of a skin lesion on an arm with no face visible and no tattoos or identifying marks would generally fall outside this category - but the assessment needs clinical judgement case by case.

18. Any other unique identifying number, characteristic, or code

The catch-all category. Employee ID numbers, student IDs, unique identifiers assigned by a research study, barcode data that maps back to an individual - anything not covered by categories 1 through 17 that could still identify someone.

This category also prohibits creating re-identification codes. You cannot remove the patient’s name, replace it with a tracking code, and maintain a separate crosswalk table - unless the code meets specific derivation requirements under the regulation and the crosswalk is never disclosed.

The "no actual knowledge" condition

Stripping all 18 identifier types isn’t the end of it. Safe Harbor includes a second condition: the covered entity must have "no actual knowledge that the information could be used alone or in combination with other information to identify an individual."

What does this look like? Consider a dataset of patients with a rare genetic condition from a small rural clinic. Even with every identifier removed, the combination of diagnosis, approximate age, and treatment facility might narrow the field to one or two people. If the person performing the de-identification is aware of that risk, Safe Harbor cannot be relied upon for that dataset.

This is not a general obligation to conduct re-identification risk analysis - that sits closer to Expert Determination territory. It is a knowledge-based test. You do not need to hire a statistician. But you cannot close your eyes to what you already know.

When Expert Determination makes more sense

Safe Harbor suits routine disclosures - sharing records with insurers, responding to legal requests, preparing audit documentation. But it strips a lot of data. Removing all dates except year, all geographic detail below state level, and every identifying number leaves datasets that may be too sparse for certain purposes.

Research is where Expert Determination earns its overhead. A clinical study might need specific dates to analyse treatment timelines. An epidemiological study might require partial geographic data to map regional health patterns. An expert can determine which data points to retain while keeping re-identification risk acceptably low.

The expert must document the methods used and the results. HHS has not prescribed specific methodologies, which leaves room for approaches like k-anonymity, l-diversity, or differential privacy techniques. Cost is the trade-off: engaging a qualified expert, producing the documentation, and maintaining it for potential audits involves real time and expense. For a one-off records disclosure, Safe Harbor is almost always the simpler path.

Putting the checklist into practice

The 18 categories translate directly into a redaction workflow for each document.

Run automated PII detection across the full text first. RedactProof identifies 40+ personal information types, covering the Safe Harbor categories where they appear as structured data in clinical records.

Headers, footers, and margins deserve specific attention. MRNs, account numbers, and patient identifiers sit in these areas across every page of a multi-page record - the kind of repetition that’s tedious to catch manually but straightforward for automated scanning.

Narrative text needs a different approach. "The patient’s daughter Sarah collected the prescription" contains a name that detection tools will flag, but the family relationship context - confirming it falls under "relatives of the individual" - requires human judgement about scope.

Check embedded objects too. Lab report tables sometimes include patient identifiers in column headers. Scanned images may contain handwritten names or reference numbers that OCR can miss.

Once identifiers are marked, apply pixel-burn redaction. This permanently destroys the underlying text. Overlay redaction - a visual black box drawn over content - does not actually remove data and can be reversed, which fails to meet de-identification requirements. If you need an auditable record that the redaction was performed correctly, a tamper-evident verification certificate provides cryptographic confirmation the document hasn’t been modified since processing.

Disclaimer: This guide is for informational purposes only and does not constitute legal, medical, or professional advice. Consult a qualified professional for advice specific to your situation.

Frequently Asked Questions

Is de-identified health information still protected under HIPAA?

Once health information has been properly de-identified under either Safe Harbor or Expert Determination, it no longer qualifies as protected health information (PHI). It can be used and disclosed without HIPAA restrictions. Getting the de-identification right is what matters - if identifiers remain, the data is still PHI and all protections apply.

Can I keep the year of birth under Safe Harbor?

Yes, with one exception. Safe Harbor allows retaining the year portion of dates, so a birth date of 15 March 1967 becomes simply 1967. For individuals aged 89 or older, even the year must be aggregated into a single "90 or above" bracket to prevent identification in populations where very elderly individuals are statistically rare.

What qualifies someone as an expert for Expert Determination?

HHS has not published specific credential requirements. The regulation requires "appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods for rendering information not individually identifiable." In practice, this typically means a statistician, data scientist, or privacy researcher with documented de-identification experience. Both the expert’s qualifications and their methodology must be recorded.

Does Safe Harbor apply to paper records or only electronic ones?

Safe Harbor applies to protected health information in any form - electronic, paper, or oral. When redacting paper medical records before sharing physical copies, the same 18 categories apply. The practical difference is that paper requires scanning and OCR before automated detection tools can help, and manual review becomes particularly important for handwritten content.

Redact with confidence

RedactProof detects PII across your documents without uploading them. Start with a free account.