De-identifying Text

Replacing Personally-Identifying Information in Medical Records, the Scrub System

by Latanya Sweeney.

Abstract

We define a new approach to locating and replacing personally-identifying information in unrestricted text that extends beyond straight search-and-replace procedures, and we provide techniques for minimizing risk to patient confidentiality. The straightforward approach of global search and replace properly located no more than 30-60% of all personally-identifying information that appeared explicitly in letters between physicians and notes written by clinicians within a pediatric database. On the other hand, our Scrub system found 99-100% of these references. Scrub uses detection algorithms that employ templates and specialized knowledge of what constitutes a name, address, phone number and so forth.

Citation:
Latanya Sweeney. Replacing Personally-Identifying Information in Medical Records, the Scrub System. In: Cimino, JJ, ed. Proceedings, Journal of the American Medical Informatics Association (AMIA). Washington, DC: Hanley & Belfus, Inc, 1996:333-337. (This paper was awarded First Prize at AMIA 1996.) (PDF).

Related Links


Copyright © 2011. President and Fellows Harvard University.   |   IQSS   |    Data Privacy Lab   |    [info@dataprivacylab.org]