Topics in Privacy

Topics in Privacy (TIP) consists of weekly discussions and brainstorming sessions on all aspects of privacy. Discussions are often inspired by a real-world privacy problem being faced by the lead discussant, who may be from industry, government, or academia. Practice talks and presentations on specific techniques and topics are also common.

The following schedule and descriptions are tentative. Topics are usually not posted earlier than the week before.

[Talks Summer 2003] [Data Privacy Lab Projects] [Data Privacy Lab]

Schedule (Most Recent Only)

DateDescription
16/1Anonymous Internet Communications Using Mix Networks (more)
26/8The Art and Science of Privacy (more)
36/22Erroneous Fingerprint Matches (more)
46/29 Learning the Criminal Relations Behind Scam Spam (more)
57/6Outsourcing in Compliance with HIPAA and the Role of Privacy Technology (more)
67/20Usable Privacy and Security Software (more)
77/27Open Discussion (more)
88/2The Economic Potentials of RFID/EPC in Retail Industry (more)
98/9Sentiment Extraction from Unstructured Text using Tabu Search-Enhanced Markov Blanket (more)
108/20 Does privacy-preserving data mining have anything to do with privacy? (more)
118/24 Current Graduate Students Talk with New Graduate Students about Computer Science Research in Computation, Organizations and Society (more)

Abstracts of Talks and Discussions

  1. Anonymous Internet Communications Using Mix Networks
    A "mix" network is the term given to a network of machines that enable anonymous Internet communications. The goal is to provide IP anonymity to the sender of the communication.  Traditional methods for evaluating the amount of anonymity afforded by various Mix configurations have depended on either measuring the size of the set of possible senders of a particular message (the anonymity set size), or by measuring the entropy associated with the probability distribution of the message's possible senders.  This work explores further an alternative way of assessing the anonymity of a Mix system by considering the capacity of a covert channel from a sender behind the Mix to an observer of the Mix's output.  For more information, see "Anonymity and Covert Channels in Simple Timed Mix-firewalls" by R. Newman, V. Nalla, and I. Moskowitz.  Fourth Workshop on Privacy Enhancing Technologies (PET).  Toronto, Canada, May 2004. [Presenter: R. Newman, University of Florida]

  2. The Art and Science of Privacy
    Every since
    Brooke Singer (Carnegie Mellon and Data Privacy Lab 2002) did her graduate thesis in art on data privacy, the Lab has had several requests from museums for more work that enables informal, interactive learning on privacy.  Recently, another opportunity for the Lab to acquire some funds to do such work emerged.  At today's brainstorming session, we will explore possible projects, mediums to use, and the on-going intimate relationship between art, politics, and the science of privacy.  Of historical importance, as evidenced in part by Brooke's work, are algorithms that enable others to put data fragments together to learn sensitive information from disparate data sources.  Is the world now ready to move beyond sensational shock factors about what is available and also learn about solutions?  Are on-line demonstrations better?  What privacy topics should be examined?  What are the important issues to learn?  What are the guidelines or evaluation criteria by which these projects should be measured? [Presenter: L. Sweeney]

  3. Erroneous Fingerprint Matches
    In response to a May 22, 2004 Associated Press article by Andrew Kramer, entitled "Experts say erroneous fingerprint matches likely to become more common," discussion emerged on the local mailing list about the nature of fingerprint matching.  Dr. Weedn identified two reasons beyond those expressed in the article as to why the error rate may climb: (1)the workload on examiners has escalated dramatically as they are called to do more background checks; and, (2)  there are probably more errors than recognized and greater scrutiny will lead to the discovery of more cases.  Dr. Weedn further points out that the article suggests that humans prevent errors and while this may be true, they also can inadvertently or not sanction a misidentification.  Dr. Sweeney expressed concern over the use of minutae rather than the entire (or partial) image.  She points out that a few point areas (12, 18 or 24) in a fingerprint makes the decision, and that seems even more suspicious as the number of fingerprints in the gallery grows to include so many in the population.  This notion was challenged by the claim that more information may exist in the minutiae than in just the primary friction ridge pattern. With this start, the brainstorming session today will focus on current finger matching techniques and where weaknesses may reside. A copy of the article is available at
    https://www.kgw.com/sharedcontent/APStories/stories/D82MR7D80.html. [Presenter: Vic Weedn, MD JD]

  4. Learning the Criminal Relations Behind Scam Spam
    Unsolicited communications currently account for over sixty percent of all sent e-mail with projections reaching the mid-eighties. While much spam is innocuous, a portion is engineered by criminals to prey upon, or scam, unsuspecting people. The senders of scam spam attempt to mask their messages as non-spam and con through a range of tactics, including pyramid schemes, securities fraud, and identity theft via phisher mechanisms (e.g. faux PayPal or AOL websites). To lessen the suspicion of fraudulent activities, scam messages sent by the same individual, or collaborating group, augment the text of their messages and assume an endless number of pseudonyms with an equal number of different stories. In this paper, we introduce ScamSlam, a software system designed to learn the underlying number of criminal cells perpetrating a particular type of scam, as well as to identify which scam spam messages were written by which cells. The system consists of two main components; 1) a filtering mechanism based on a Poisson classifier to separate scam from general spam and non-spam messages, and 2) a message normalization and clustering technique to relate scam messages to one another. We apply ScamSlam to a corpus of approximately 500 scam messages communicating the Nigerian advance fee fraud. The scam filtration method filters out greater than 99\% of scam messages, which vastly outperforms well known spam filtering software which catches only 82\% of the scam messages. Through the clustering component, we discover that at least half of all scam messages are accounted for by 20 individuals or collaborating groups. [Presenter: Edoardo Airoldi]

  5. Outsourcing in Compliance with HIPAA and the Role of Privacy Technology
    There has been a plethora of articles on outsourcing and drafts of various bills placing limits on outsourcing are circulating around Congress.  A lot of attention is primarily placed on offshoring and the loss of American jobs that may be related.  However, there are many privacy concerns as well.  This discussion will examine privacy problems related to the sharing of medical data for outsourcing and offshoring.  Questions that will be explored include the following.  What are the kinds of data and uses for outsourcing?  What are the privacy risks?  How do these relate to HIPAA (the new medical privacy legislation)?  What role can technology and privacy technology play? [Presenter: Latanya Sweeney]

  6. Usable Privacy and Security Software
    Earlier this year, Lorrie Cranor launched the CMU Usable Privacy and Security Lab (CUPS) .  She also led a Workshop on Usable Privacy and Security Software earlier this month.  In this TIP session, Lorrie will be describing CUPS and work in this area.More information about the new lab is available at
    cups.cs.cmu.edu. [Presenter: Lorrie Cranor]

  7. Open Discussion
    This session is dedicated to discussing topics of interest to those who attend.  Topics may include recent events in the press, general privacy research trends and/or concerns, or work currently underway.  At least once or twice a semester, we try to have these kinds of open discussion sessions because they afford more topics to be covered in a session than possible when a full session is dedicated to one topic.  Please attend and brainstorm with us.

  8. The Economic Potentials of RFID/EPC in Retail Industry
    The use of mobile information systems and communication technology furthers the informatization and automation of operating processes. In the retail industry, mobile communication technology affords new forms of customer communication by establishing electronic and individually designable channels of communication. In conjunction with powerful information systems, these options enable both improved services for customers and an increase in turnover and profit through more flexible and individualized pricing and communication policy. This contribution exemplifies the forms of pricing policy considering the "Extra Future Store" of METRO Group1 as case study and identifies two determinants of success for these new types of electronic customer communications: the power of disposal over the end device and the range of communication. [Presenters: Dr. Jens Strueker and Dr. Stefan Sackmann, Institute of Computer Science and Social Studies, University of Freiburg]

  9. Sentiment Extraction from Unstructured Text using Tabu Search-Enhanced Markov Blanket
    Extracting sentiments from unstructured text has emerged as an important problem in many disciplines. An accurate method would enable us, for example, to mine on-line opinions from the Internet and learn customers' preferences for economic or marketing research, or for leveraging a strategic advantage. In this paper, we propose a two-stage Bayesian algorithm that is able to capture the dependencies among words, and, at the same time, finds a vocabulary that is efficient for the purpose of extracting sentiments. Experimental results on the Movie Reviews data set show that our algorithm is able to select a parsimonious feature set with substantially fewer predictor variables than in the full data set and leads to better predictions about sentiment orientations than several state-of-the-art machine learning methods. Our findings suggest that sentiments are captured by conditional dependence relations among words, rather than by keywords or high-frequency words. More information and a copy of the paper available at
    dataprivacylab.org/dataprivacy/projects/sentiment/. [Presenter: Edoardo Airoldi]

  10. Does privacy-preserving data mining have anything to do with privacy?
    Introductions and updates on recent work followed by open discussion and debated about "privacy" in the body of work known as privacy-preserving data mining. [Presenter: Chris Clifton, Purdue University]

  11. Current Graduate Students Talk with New Graduate Students about Computer Science Research in Computation, Organizations and Society
    The newest PhD Program in Carnegie Mellon's School of Computer Science is the PhD Program in Computation, Organizations and Society (COS). In this session, existing COS students talk about their research and the research process.  COS students entering this Fall semester talk about their research interests and expectations. Others are invited to share research experiences with new and developing students. More information about the new COS PhD program is available at
    www.cos.cs.cmu.edu.

Related Data Privacy links



Summer 2004 Data Privacy Laboratory [LIDAP@dataprivacylab.org]