Unsolicited communications currently accounts for over sixty percent of all sent e-mail with projections reaching the mid-eighties. While much spam is innocuous, a portion is engineered by criminals to prey upon, or scam, unsuspecting people. The senders of scam spam attempt to mask their messages as nonspam and con through a range of tactics, including pyramid schemes, securities fraud, and identity theft via phisher mechanisms (e.g. faux PayPal or AOL websites). To lessen the suspicion of fraudulent activities, scam messages sent by the same individual, or collaborating group, augment the text of their messages and assume an endless number of pseudonyms with an equal number of different stories. In this paper, we introduce ScamSlam, a software system designed to learn the underlying number criminal cells perpetrating a particular type of scam, as well as to identify which scam spam messages were written by which cell. The system consists of two main components; 1) a filtering mechanism based on a Poisson classifier to separate scam from general spam and non-spam messages, and 2) a message normalization and clustering technique to relate scam messages to one another. We apply ScamSlam to a corpus of approximately 500 scam messages communicating the ̉NigerianÓ advance fee fraud. The scam filtration method filters out greater than 99% of scam messages, which vastly outperforms well known spam filtering software which catches only 82% of the scam messages. Through the clustering component, we discover that at least half of all scam messages are accounted for by 20 individuals or collaborating groups.
Keywords: Spam, scam, Internet fraud, e-mail filtering, text analysis, text classification, poisson classification models, single linkage clustering, information retrieval, semantic learning
E. Airoldi and B. Malin ScamSlam: An Architecture for Learning the Criminal Relations Behind Scam Spam. Carnegie Mellon University, School of Computer Science, Technical Report CMU-ISRI-04-121. Pittsburgh: May 2004. (PDF)
Tell me more about: