Replacing Personally-Identifying Information in Text,
the Scrub De-identification System

by Latanya Sweeney


We define a new approach to locating and replacing personally-identifying information in unrestricted text that extends beyond straight search-and-replace procedures, and we provide techniques for minimizing risk to patient confidentiality. The straightforward approach of global search and replace properly located no more than 30-60% of all personally-identifying information that appeared explicitly in letters between physicians and notes written by clinicians within a pediatric database. On the other hand, our Scrub system found 99-100% of these references. Scrub uses detection algorithms that employ templates and specialized knowledge of what constitutes a name, address, phone number and so forth.


  • Sweeney, L. Replacing Personally-Identifying Information in Medical Records, the Scrub System. In: Cimino, JJ, ed. Proceedings, Journal of the American Medical Informatics Association. Washington, DC: Hanley & Belfus, Inc, 1996:333-337.
    This paper was awarded First Prize at AMIA 1996.
    Postscript file (300 KB)

    Click here for:
  • Latanya Sweeney's Home Page
  • Privacy and confidentiality
  • Computational disclosure control
  • Selected publications by Latanya Sweeney

    Last modified 3/6/97 by