Detecting Email Aliases Using Social Networks

Email Alias Detection Using Social Network Analysis

by Ralf Holzer


This research addresses the problem of correctly relating aliases that belong to the same entity. Previous approaches focused on natural language processing and structured data, whereas in this research we analyze the local association, or “social” network in which aliases reside. The network is constructed from email data mined from the Internet. Links in the network represent web pages on which two email addresses are collocated. The problem is defined as given social network S, constructed from email address collocations, and an email address E, identify any aliases for E that also appear in S. The alias detection methods are evaluated on a data set of over 14,000 University X email addresses for which ground truth relations are known. The results are reported as partial lists of k choices for possible aliases, ranked by predicted relational strength within the network. Given a source email address, a portion of all email addresses, 2%, are correctly linked to another alias that corresponds to the same entity by best rank, which is significantly better than random (0.007%) and a geodesic distance (1%) baseline prediction. Correct linkages increase to 15% and 30% within top-10 (0.07% of all emails) and top-100 rank lists (0.7% of all emails), respectively.

Keywords: entity resolution, alias detection, colocation, social networks

R. Holzer, B. Malin and
L. Sweeney. Email Alias Detection Using Social Network Analysis. Proceedings of the ACM SIGKDD Workshop on Link Discovery (LinkKDD): Issues, Approaches, and Applications. Chicago, IL. August 2005. PDF.

Related Links

Spring 2005 [Data Privacy Lab]