Privacy and Research

Commentary: Researchers need not rely on consent or not

by Latanya Sweeney

In response to: Melton LJ III. The threat to medical-records research. N Engl J Med 1997;337:1466-70.

Melton describes an environment at the Mayo Clinic in which there has been a long tradition of researchers' using patients' records in an open manner. But until recently, there existed natural limits that protected patients' privacy; technology now erodes these limits at an alarming rate. For example, the physical labor previously involved in manually reviewing records provided an economic boundary that restricted the dissemination of person- specific data. Researchers were once physically limited to the records facility itself to gather needed information, but in a globally networked society it is possible for a researcher located anywhere in the world to gain immediate electronic access to patients' files. Today's technology does pose unparalleled threats to patients' privacy, but today's technology also offers solutions.

Many details about our lives are documented on computers, and when this information is linked together, the resulting profiles can identify individual persons as accurately as fingerprints, even when the information contains no explicit identifiers such as name and address [1,2]. The increase in the availability of detailed data, as well as inexpensive technology to process it, is having a dramatic impact on research. Having more clinical information available will probably lead to more epidemiologic studies, especially since it can help ensure the validity and generalizability of specific studies. Most likely this will result in a dramatic increase in the number of records released.

A Harris–Equifax poll [3] implies that the public would be willing to share information for research, provided researchers and others could not identify any person included in the released data. Melton seems intent on complete access to identifiable information. But he could have conducted his hip-fracture study without identifiable data. All he needed was age, sex, diagnosis (i.e., hip fracture), and date of diagnosis for each stratum. Generalization, suppression, and anonymous linking are among the various computational techniques currently available [1,4]. These techniques are intended to release the minimal data needed in the most general format possible, ensuring confidentiality, on the one hand, and usefulness, on the other. In cases in which identifying information is required, these techniques reduce unnecessary risk.

Fear and concern about privacy in the computer age are justified, but the options are not limited to past practices; a new spectrum of solutions is emerging. If researchers want patients to release sensitive data, they should be willing to use technology that ensures patients’ privacy within the released data.

Sweeney, L.. Commentary: Researchers need not rely on consent or not. New England Journal of Medicine, 1998. (PDF)


  1. Sweeney L. Weaving technology and policy together to maintain confidentiality. J Law Med Ethics 1997;25(2&3):98-110.

  2. A guide to state-level ambulatory care data collection activities. Falls Church, Va.: National Association of Health Data Organizations, 1996.

  3. Louis Harris and Associates. The Equifax-Harris consumer privacy survey. Atlanta: Equifax, 1994.

  4. Sweeney L. Datafly: a system for providing anonymity in medical data. In: Lin T, Qian S, eds. Database security XI: status and prospects. New York: IFIP/IEEE/Chapman & Hall, 1998:356-81.

Related Links

Copyright © 2011. President and Fellows Harvard University.   |   IQSS   |    Data Privacy Lab   |    []