Detecting Email Aliases Using Social Networks |
by Ralf Holzer
Keywords: entity resolution, alias detection, colocation, social networks
Citation:
Abstract
This research addresses the problem of correctly relating aliases
that belong to the same entity. Previous approaches focused on
natural language processing and structured data, whereas in this
research we analyze the local association, or “social” network in
which aliases reside. The network is constructed from email data
mined from the Internet. Links in the network represent web pages
on which two email addresses are collocated. The problem is defined
as given social network S, constructed from email address
collocations, and an email address E, identify any aliases for E
that also appear in S. The alias detection methods are evaluated on
a data set of over 14,000 University X email addresses for which
ground truth relations are known. The results are reported as partial
lists of k choices for possible aliases, ranked by predicted relational
strength within the network. Given a source email address,
a portion of all email addresses, 2%, are correctly linked to another
alias that corresponds to the same entity by best rank, which
is significantly better than random (0.007%) and a geodesic distance
(1%) baseline prediction. Correct linkages increase to 15%
and 30% within top-10 (0.07% of all emails) and top-100 rank lists
(0.7% of all emails), respectively.
R. Holzer, B. Malin and
L. Sweeney.
Email Alias Detection Using Social Network Analysis.
Proceedings of the ACM SIGKDD Workshop on Link Discovery (LinkKDD): Issues,
Approaches, and Applications. Chicago, IL. August 2005.
PDF.
Related Links