Identifiability Project |
L. Sweeney
Patient Identifiability in Pharmaceutical Marketing Data.
Data Privacy Lab Working Paper 1015. Cambridge 2011.
(PDF).
Keywords: HIPAA Privacy Rule, identifiability, data privacy, re-identification
Related links:
Does pharmaceutical marketing data expose patient records? In 2003, just after the promulgation of the HIPAA Privacy Rule,
a major American pharmaceutical company commissioned a report across 9 states to determine the number of people in those
states who may be at risk of being identified if patient pharmacy claims data used for marketing were shared.
In May 2003 the report showed that 2.3% of individuals could be uniquely identified from the de-identified prescription
records used for marketing purposes at the time and that 6.1% were identifiable to a binsize of 2 (i.e., the record
either uniquely related to one named person or related indistinguishably to 2 identified people). These results used
prescription information {drug, dosage and refill information, patient diagnosis, patient ZIP inferred from pharmacy
ZIP, prescription fill date}. No explicit patient identifiers (e.g., name or address) appeared in the data.
The prescribing doctor was not uniquely identified. Results were based on the states: New York, Illinois, Michigan,
Massachusetts, Florida, California, Pennsylvania, Texas, and Arizona. The primary means of re-identification
was linking the prescription records to ambulatory and hospital discharge data using patient {diagnosis, inferred
ZIP, and drug, dosage and refill information} to learn more patient demographics and then linking that result
to a voter list (or other population register) to learn the names of the subjects of the prescriptions.
In comparison, the HIPAA Safe Harbor tends to re-identify about 0.04% of the population, thereby showing that
in general more personal information is put at risk in these data than with the HIPAA Safe Harbor, however variability
exists in re-identification rates from state to state with some states having re-identification rates less than the HIPAA
Safe Harbor. Other privacy observations found in the data, but not part of the analysis, include: (1) the data did
not segment or restrict access to special medical classes protected by law, such as psychiatric and HIV related
prescriptions; and, (2) the data made it possible to construct a patient's prescription profile over time, which
could further increase re-identification risk. This paper summarizes the earlier 2003 report, reviews subsequent
publication, and imposes the emergent scientific-legal approach of comparing re-identification rates to the HIPAA Safe Harbor.
In the end though, this paper demonstrates the best of measuring de-identification risks while exposing the perils of de-identification as a regime.