Identifiability Project |
In this document, I report on experiments I conducted using 1990 U.S. Census summary
data to determine how many individuals within geographically situated populations had
combinations of demographic values that occurred infrequently. It was found that combinations
of few characteristics often combine in populations to uniquely or nearly uniquely identify some
individuals. Clearly, data released containing such information about these individuals should not
be considered anonymous. Yet, health and other person-specific data are publicly available in this
form. Here are some surprising results using only three fields of information, even though typical
data releases contain many more fields. It was found that 87% (216 million of 248 million) of the
population in the United States had reported characteristics that likely made them unique based
only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248
million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where
place is basically the city, town, or municipality in which the person resides. And even at the
county level, {county, gender, date of birth} are likely to uniquely identify 18% of the U.S.
population. In general, few characteristics are needed to uniquely identify a person.
L. Sweeney
Simple Demographics Often Identify People Uniquely.
Carnegie Mellon University, Data
Privacy Working Paper 3. Pittsburgh 2000.
(PDF).
Related links: