Guaranteeing Anonymity when Sharing Medical Data,
the Datafly System

by Latanya Sweeney

Abstract

We present a computer program named Datafly that maintains anonymity in medical data by automatically generalizing, substituting and removing information as appropriate without losing many of the details found within the data. Decisions are made at the field and record level at the time of database access, so the approach can be used on the fly in role-based security within an institution, and in batch mode for exporting data from an institution. Often organizations release and receive medical data with all explicit identifiers, such as name, address, phone number, and Social Security number, removed in the incorrect belief that patient confidentiality is maintained because the resulting data look anonymous; however, we show that in most of these cases, the remaining data can be used to re-identify individuals by linking or matching the data to other databases or by looking at unique characteristics found in the fields and records of the database itself. When these less apparent aspects are taken into account, each released record can be made to ambiguously map to many possible people, providing a level of anonymity which the user determines.

Publications

  • Protection models for anonymous databases. Under review for publication.
  • Towards the collection of all the data on all the people. MIT Artificial Intelligence Working Paper, 1998.
  • Foundations of computational disclosure control. Under review for publication.
  • Commentary: researchers need not rely on consent or not. New England Journal of Medicine, 1998. (forthcoming)
  • Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression (with Pierangela Samarati). Unpublished.
  • Datafly: a system for providing anonymity in medical data. Database Security XI: Status and Prospects, T.Y. Lin and S. Qian, eds. IEEE, IFIP. New York: Chapman & Hall, 1998.
    Postscript file,(238 KB)
  • Sweeney, L. Maintaining anonymity when sharing medical data, the datafly system. MIT Artificial Intelligence Laboratory Working Paper. Cambridge: AIWP-WP344 (1997).
    Long, technical paper.
    Postscript file (1.4 MB)
    Postscript file, Compressed (380 KB)

  • Sweeney, L. Computational disclosure control for medical microdata. Record Linkage Workshop Bureau of the Census. Washington: (1997).
    Coming to the web soon.

  • Sweeney, L. Guaranteeing anonymity when sharing medical data, the datafly system. Proceedings, Journal of the American Medical Informatics Association. Washington, DC: Hanley & Belfus, Inc, 1997.
    Short paper. Coming to the web soon.


    Click here for:
  • Latanya Sweeney's Home Page
  • Privacy and confidentiality
  • Computational disclosure control
  • Selected publications by Latanya Sweeney

    Last modified 2/24/98 by sweeney@mit.edu