De-identification Project |
In this paper we address the problem of releasing person-specific data while, at the same time,
safeguarding the anonymity of individuals to whom the data refer. The approach is based on the definition
of k-anonymity. A table provides k-anonymity if attempts to link explicitly
identifying information to its contents ambiguiously map the information to at least k entities.
We illustrate how k-anonymity can be provided by using generalization and suppression techniques.
We introduce the concept of minimal generalization, which captures the property of the release process
not to distort the data more than needed to achieve k-anonymity.
We illustrate possible preference policies to choose among different minimal generalizations.
Finally, we present an algorithm and experimental results when an implementation of the algorithm
was used to produce releases of real medical information. We also report ont he quality of the released
data by measuring precision and completeness of the results for different values of k.
Keywords: data anonymity, data privacy, re-identification, data fusion, privacy
Citation:
This same paper also appears as Protecting respondents identities in microdata release,
IEEE Transactions on Knowledge and Data Engineering, 2001.
See links below for authorative sources of k-anonymity.
Abstract
Today's globally networked society places great demand on the dissemination and sharing
of person-specific data. Situations where aggregate statistical information was once the reporting
norm now rely heavily on the transfer of microscopically detailed transaction and encounter information.
This happens at a time when more and more historically public information is also electronically
available. When these data are linked together, they provide an electronic shadow of a person
or organization that is as identifying and personal as a fingerprint, even when the sources
of the information contains no explicit identifiers, such as name and phone number.
In order to protect the anonymity of individuals to whom released data refer, data
holders often remove or encrypt explicit identifiers such as names, addresses and phone numbers.
However, other distinctive data, which we term quasi-identifiers, often combine
unquely and can be linked to publicly available information to re-identify individuals.
Pierangela Samarati and
L. Sweeney. k-anonymity: a model for protecting privacy.
Proceedings of the IEEE Symposium on Research
in Security and Privacy (S&P). May 1998, Oakland, CA.
(PDF).
Related Links