De-identification Project |
In this paper, we provide a computational disclosure technique for releasing information from a private
table such that the identity of any individual to whom the released data refer cannot be denitively
recognized. Our approach protects against linking to other data. It is based on the concepts of gen-
eralization, by which stored values can be replaced with semantically consistent and truthful but less
precise alternatives, and of k-anonymity. A table is said to provide k-anonymity when the contained
data do not allow the recipient to associate the released information to a set of individuals smaller than
k. We introduce the notions of generalized table and of minimal generalization of a table with respect to
a k-anonymity requirement. As an optimization problem, the objective is to minimally distort the data
while providing adequate protection. We describe an algorithm that, given a table, eciently computes
a preferred minimal generalization to provide anonymity.
Keywords: data anonymity, data privacy, re-identification, data fusion, privacy
Citation:
See links below for authorative sources of k-anonymity.
Abstract
The proliferation of information on the Internet and access to fast computers with large storage capaci-
ties has increased the volume of information collected and disseminated about individuals. The existence
os these other data sources makes it much easier to re-identify individuals whose private information is
released in data believed to be anonymous. At the same time, increasing demands are made on organi-
zations to release individualized data rather than aggregate statistical information. Even when explicit
identiers, such as name and phone number, are removed or encrypted when releasing individualized
data, other characteristic data, which we term quasi-identiers, can exist which allow the data recipient
to re-identify individuals to whom the data refer.
Pierangela Samarati and
L. Sweeney.
Generalizing Data to Provide Anonymity when Disclosing Information
ACM Principles of Database
Systems (PODS). Seattle, WA, USA, 1998.
(PDF).
Related Links