Data Anonymization Project |
Keywords: data anonymity, data privacy, re-identification, data fusion, privacy
Citation:
Abstract
We present a computer program named Datafly that uses computational disclosure techniques to
maintain anonymity in medical data by automatically generalizing, substituting and removing information
as appropriate without losing many of the details found within the data. Decisions are made at
the field and record level at the time of database access, so the approach can be used on the fly in
role-based security within an institution, and in batch mode for exporting data from an institution.
Often organizations release and receive medical data with all explicit identifiers, such as name, address,
phone number, and social security number, removed in the incorrect belief that patient confidentiality
is maintained because the resulting data look anonymous; however, we show that in most of
these cases, the remaining data can be used to re-identify individuals by linking or matching the data
to other databases or by looking at unique characteristics found in the fields and records of the database
itself. When these less apparent aspects are taken into account, each released record can be
made to ambiguously map to many possible people, providing a level of anonymity which the user determines.
Latanya Sweeney.
Computational Disclosure Control for Medical Microdata.
Record Linkage Workshop. Bureau of the Census. Washington, DC: 1997.
(PDF).
Related Publications