De-identification Project

Generalizing Data to Provide Anonymity when Disclosing Information

by Pierangela Samarati and Latanya Sweeney, Ph.D.

Abstract

The proliferation of information on the Internet and access to fast computers with large storage capaci- ties has increased the volume of information collected and disseminated about individuals. The existence os these other data sources makes it much easier to re-identify individuals whose private information is released in data believed to be anonymous. At the same time, increasing demands are made on organi- zations to release individualized data rather than aggregate statistical information. Even when explicit identi ers, such as name and phone number, are removed or encrypted when releasing individualized data, other characteristic data, which we term quasi-identi ers, can exist which allow the data recipient to re-identify individuals to whom the data refer.

In this paper, we provide a computational disclosure technique for releasing information from a private table such that the identity of any individual to whom the released data refer cannot be de nitively recognized. Our approach protects against linking to other data. It is based on the concepts of gen- eralization, by which stored values can be replaced with semantically consistent and truthful but less precise alternatives, and of k-anonymity. A table is said to provide k-anonymity when the contained data do not allow the recipient to associate the released information to a set of individuals smaller than k. We introduce the notions of generalized table and of minimal generalization of a table with respect to a k-anonymity requirement. As an optimization problem, the objective is to minimally distort the data while providing adequate protection. We describe an algorithm that, given a table, eciently computes a preferred minimal generalization to provide anonymity.

Keywords: data anonymity, data privacy, re-identification, data fusion, privacy

Citation:
Pierangela Samarati and L. Sweeney. Generalizing Data to Provide Anonymity when Disclosing Information ACM Principles of Database Systems (PODS). Seattle, WA, USA, 1998. (
PDF).

See links below for authorative sources of k-anonymity.

Related Links


Summer 2003 Data Privacy Lab [De-identification Project]