Identifiability Project

Patient Identifiability in Pharmaceutical Marketing Data

by Latanya Sweeney

PDF

Does pharmaceutical marketing data expose patient records? In 2003, just after the promulgation of the HIPAA Privacy Rule, a major American pharmaceutical company commissioned a report across 9 states to determine the number of people in those states who may be at risk of being identified if patient pharmacy claims data used for marketing were shared. In May 2003 the report showed that 2.3% of individuals could be uniquely identified from the de-identified prescription records used for marketing purposes at the time and that 6.1% were identifiable to a binsize of 2 (i.e., the record either uniquely related to one named person or related indistinguishably to 2 identified people). These results used prescription information {drug, dosage and refill information, patient diagnosis, patient ZIP inferred from pharmacy ZIP, prescription fill date}. No explicit patient identifiers (e.g., name or address) appeared in the data. The prescribing doctor was not uniquely identified. Results were based on the states: New York, Illinois, Michigan, Massachusetts, Florida, California, Pennsylvania, Texas, and Arizona. The primary means of re-identification was linking the prescription records to ambulatory and hospital discharge data using patient {diagnosis, inferred ZIP, and drug, dosage and refill information} to learn more patient demographics and then linking that result to a voter list (or other population register) to learn the names of the subjects of the prescriptions. In comparison, the HIPAA Safe Harbor tends to re-identify about 0.04% of the population, thereby showing that in general more personal information is put at risk in these data than with the HIPAA Safe Harbor, however variability exists in re-identification rates from state to state with some states having re-identification rates less than the HIPAA Safe Harbor. Other privacy observations found in the data, but not part of the analysis, include: (1) the data did not segment or restrict access to special medical classes protected by law, such as psychiatric and HIV related prescriptions; and, (2) the data made it possible to construct a patient's prescription profile over time, which could further increase re-identification risk. This paper summarizes the earlier 2003 report, reviews subsequent publication, and imposes the emergent scientific-legal approach of comparing re-identification rates to the HIPAA Safe Harbor. In the end though, this paper demonstrates the best of measuring de-identification risks while exposing the perils of de-identification as a regime.
L. Sweeney Patient Identifiability in Pharmaceutical Marketing Data. Data Privacy Lab Working Paper 1015. Cambridge 2011. (PDF).
Keywords: HIPAA Privacy Rule, identifiability, data privacy, re-identification
Related links:

Patient Privacy Risks in U.S. Supreme Court Case Sorrell v. IMS Health Inc..
U.S. Supreme Court Case Sorrell v. IMS Health Inc..
Simple Demographics Often Identify People Uniquely.
k-anonymity: a model for protecting privacy.
Privacert Risk Assessment.
Achieving k-anonymity privacy protection using generalization and suppression.

Identifiability Project

Patient Identifiability in Pharmaceutical Marketing Data