Re-Identification of Maine and Vermont Health Data

Risks to Patient Privacy: a re-identification of patients in Maine and Vermont statewide hospital data

by Ji Su Yoo, Alexandra Thaler, Latanya Sweeney, and Jinyan Zang

Abstract

Forty-eight states in the United States collect statewide inpatient discharge data that include personal health information of each patient's hospital visit. A 2013 survey found that 33 of those states subsequently sold or otherwise disclosed copies of the data, but only 3 states de-identified to the standards established under the Health Information Portability and Accountability Act (HIPAA), the U.S. federal regulation that dictates the rules by which personal health information is shared. Did the other 30 states put personal health data at risk? To answer this question, Sweeney tested whether Washington State's hospital data was vulnerable to re-identification. The study showed that Washington State's inpatient data allowed for the correct and unique matching of 35 of 81 (or 43 percent) local newspaper stories to anonymized hospital visits. After the study, Washington State improved its anonymization standard for publicly available data and added an application process for others to receive more detailed non-public discharge data. Despite this successful outcome, many states were not convinced that the same re-identification strategy would be successful on their datasets. One reason was a belief that Washington State was more vulnerable because it shared patient age in months, a practice not done by many other states. Is this correct? Are other states exempt from this re-identification strategy? To find out, we repeated the approach on statewide health data from 2010 in Maine and 2011 in Vermont using a total of 291 local news stories.

Results Summary: We found 69 of 244 (or 28.3 percent) of the names from local news stories uniquely matched to one and only one hospitalization in the Maine hospital data. Even if redacted to the HIPAA Safe Harbor standard, the Maine data still allowed 8 matches would result (or 3.3 percent). In Vermont, we found that 16 of the 47 (or 34 percent) of the names from the news stories uniquely matched to one hospitalization. If the Vermont data complied with the HIPAA Safe Harbor standard, 5 matches would result (or 10.6 percent). Such findings suggest that patients are vulnerable to re-identification even when hospital data is de-identified according to HIPAA Safe Harbor guidelines. We call for more rigorous inquiry on the vulnerabilities that exist even when following HIPAA Safe Harbor as a standard for de-identification.


To Appear in the Journal of Technology Science on November 7, 2017.

Citation:

Yoo J, Thaler A, Sweeney L, and Zang J. Risks to Patient Privacy: a re-identification of patients in Maine and Vermont statewide hospital data. Technology Science. 2017110701. November 7, 2017. https://techscience.org/a/2017110701

A white paper version is available as:

Yoo J, Thaler A, Sweeney L, and Zang J. Risks to Patient Privacy: a re-identification of patients in Maine and Vermont statewide hospital data. Harvard University. Data Privacy Lab. White Paper. Oct 25, 2017.

Related Projects at the Data Privacy Lab


Copyright © 2011. President and Fellows Harvard University.   |   IQSS   |    Data Privacy Lab   |    [info@dataprivacylab.org]