Trails Learning Project
This paper provides algorithms for learning the identities of
individuals from the trails of seemingly anonymous information
they leave behind. Consider online consumers, who have the IP
addresses of their computers logged at each website visited.
Many falsely believe they cannot be identified. The term "reidentification"
refers to correctly relating seemingly anonymous
data to explicitly identifying information (such as the name or
address) of the person who is the subject of those data. Reidentification
has historically been associated with data released
from a single data holder. This paper extends the concept to "trail
re-identification" in which a person is related to a trail of
seemingly anonymous and homogenous data left across different
locations. The 3 novel algorithms presented in this paper perform
trail re-identifications by exploiting the fact that some locations
also capture explicitly identifying information and subsequently
provide the unidentified data and the identified data as separate
data releases. Intersecting occurrences in these two kinds of data
can reveal identities. For example, an online consumer may visit
50 websites and purchase at 5 and another may visit 30 sites and
purchase at 7. Shared visit logs provide unidentified data.
Exchanged customer lists provide identified data. The algorithms
presented herein re-identify individuals based on the uniqueness
of trails across unidentified and identified datasets. The
algorithms differ in the amount of completeness and multiplicity
assumed in the data. Successful re-identifications are reported for
DNA sequences left by hospital patients and for IP addresses left
by online consumers. These algorithms are extensible to tracking
collocations of people, which is an objective of homeland defense
Bradley Malin, Latanya Sweeney, and Elaine Newton. Trail Re-Identification: Learning Who You Are From Where You Have Been. LIDAP-WP12. Carnegie Mellon University, Laboratory for International Data Privacy, Pittsburgh, PA: March 2003. (PDF).