ScamSlam Project

Data mining challenges for electronic safety: the case of fraudulent intent detection in e-mails

Online criminals have adapted traditional snail mail and door-to-door fraudulent schemes into electronic form. Increasingly, such schemes target an individualÕs personal email, where they mingle among, and are masked by, honest communications. The targeting and conniving nature of these schemes are an infringement upon an individualÕs personal privacy, as well as a threat to personal safety. In this paper, we introduce an array of challenges which are ripe for the attention of the data mining research community and are vastly different from those of combating the general problem of spam. We illustrate how state-of-theart spam filtering systems fail to capture fraudulent intent hidden in the text of e-mails, but demonstrate how more robust systems can be engineered using existing data mining tools. We conclude by examining a specific scheme, the Nigerian 4-1-9 advance fee fraud scam, for which we design a learning system capable of accurately identifying the fraudulent indent within an e-mail. Our system is applicable to fraud detection and can serve as a guide for law enforcement agencies in cyber-investigations.

Keywords: Spam, scam, Internet fraud, e-mail filtering, text analysis, text classification, poisson classification models, single linkage clustering, information retrieval, semantic learning

E. Airoldi and B. Malin Data mining challenges for electronic safety: the case of fraudulent intent detection in e-mails.   In Proceedings of the Workshop on Privacy and Security Aspects of Data Mining, in conjunction with the IEEE International Conference on Data Mining.   Brighton, England: November 2004.   (

Related Publications

Tell me more about:

Fall 2005 Data Privacy Lab