Privacy-Preserving Surveillance

Towards a Privacy-Preserving Watchlist

by Latanya Sweeney


An important surveillance problem is the Watchlist Problem, which can be generally described as follows. Government authorities have an explicit list of names of known or suspected terrorists (a “watchlist”) they want to locate or merely track among the U.S. population. There are vast numbers of locations the government seeks to query as to whether a customer or patient has appeared in one of these locations bearing an explicit identity appearing on the Watchlist. The idea is to review transactional data (such as that which results from store purchases, hotel registrations, airplane manifests, car rentals, school attendance records, etc). The authorities are to be notified if someone bearing an identity of someone on the watchlist appears at a location. More precisely, the Watchlist Problem is defined as follows.

Given: (1) a set of data holders, where each data holder has a dataset of various transactional attributes about people; and, (2) a central authority, who has a list of suspicious people (“Watchlist”), the goal is to provide a system which identifies occurrences of suspicious people in the transactional data to the central authority without revealing the Watchlist to the data holders or the identities of the subjects of the transactional data not on the Watchlist to the central authority.

This work examines two challenges lurking within the watchlist problem: (1) false matches; and, (2) trail re-identification vulnerabilities. At present, there is NO solution to the watchlist problem.

Even when strong cryptographic hashing is used, as proposed by ANNA (Jonas 2003), trail re-identification vulnerabilities continue. In the ANNA proposal, each person’s name is hashed (along with hashes of other possible name spellings, so that rather than a single hashed value, a set of hash values result). These hashed sets are shared and compared rather than actual names. This proposal provides false privacy protection, because credit card transactions, loyalty card use, and other explicitly identifying information can be used to relate the identities of the hashed sets to the named identities that generated the sets using trail re-identification.

Keywords: homeland security, privacy-preserving surveillance, law-enforcement, entity resolution, identity matching, artificial intelligence

L. Sweeney. Towards a Privacy-Preserving Watchlist Solution. AAAI Spring Symposium, AI Technologies for Homeland Security, 2005. (Poster)

Related Publications

  • L. Sweeney. "Privacy Technologies for Homeland Security", Testimony before the Privacy and Integrity Advisory Committee of the Department of Homeland Security (“DHS”), Boston, MA, June 15, 2005. (Testimony and Appendices)

  • L. Sweeney. Privacy-Enhanced Linking. ACM SIGKDD Explorations 7(2) December 2005. (PDF).

  • L. Sweeney. Privacy-Preserving Surveillance using Databases from Daily Life. IEEE Intelligent Systems, 20 (5), September-October 2005. Earlier version: Privacy-Preserving Surveillance Using Selective Revelation. Carnegie Mellon University, LIDAP Working Paper 15, February 2005. (PDF).

Related Links

Copyright © 2011. President and Fellows Harvard University.   |   IQSS   |    Data Privacy Lab   |    []