Topics in Privacy

Topics in Privacy (TIP) consists of weekly discussions and brainstorming sessions on all aspects of privacy. Discussions are often inspired by a real-world privacy problem being faced by the lead discussant, who may be from industry, government, or academia. Practice talks and presentations on specific techniques and topics are also common.

The following schedule and descriptions are tentative. Topics are usually not posted earlier than the week before.

[Other Educational Activities] [Data Privacy Lab Projects] [Data Privacy Lab]

Schedule (SUMMER 2003)

DateDescription
16/9Real-Time De-identification of Data for Bio-Surveillance (more)
26/16Privacy Issues Related to Sharing Email Messages for Research on DARPA's EPCA Program (more)
36/23World-Wide Reference Resolution (more)
46/30Computer Science Research Butts Privacy I (more)
57/7 Computer Science Research Butts Privacy II (more)
67/14Training Chief Privacy Officers (more)
77/21Privacy Rights in the KALI Project (more)
87/28Computer Science Research and the IRB Process (more)
98/4k-Anonymous Messaging (more)
108/11IBM's Enterprise Privacy Authorization Language [EPAL] (more)

Abstracts of Talks and Discussions

  1. Real-Time De-identification of Data for Bio-Surveillance
    Today in the U.S., efforts aimed at the surveillance of individuals is unprecedented. The time is ripe for providing technology that can enable data sharing while also providing scientific assurances of privacy protection. In this talk, we will examine a new effort (soon to be deployed into real-world practice) for the sharing of data for bio-terrorism surveillance. Using privacy technology, namely a CertBox, health data can be sufficiently de-identified under the scientific standard for de-identification of the new Privacy Rule (HIPAA), and the resulting data remain useful for bio-terrorism surveillance. In this talk we will look at this data sharing environment, a standard of sufficient de-identification allowed under the new Privacy Rule, and the usefulness of the resulting data. We will show that medical data can be sufficiently de-identified under HIPAA, allowing it to be shared freely, and the resulting de-identified data remain useful for bio-terrorism surveillance. [Presenter: L. Sweeney]

  2. Privacy Issues Related to Sharing Email Messages for Research on DARPA's EPCA Program
    DARPA's new EPCA program is spending approximately $100M over five years to produce an "enduring personal cognitive assistant" which will:
    • assist the user in office automation tasks like scheduling meetings, tracking and organizing activities, distributing and summarizing information, etc;
    • sense a user's environment via analyzing email, IM inputs, calendar appointments, web usage, and in some cases, sensor data from cameras and microphones, etc;
    • adapt over time to a particular user by learning preferences, building models of a user's activities, understanding a user's social connections, etc.
    The privacy issues are enormous and (in spite of the 5-yr timespan) immediate.  One short-term issue, for instance, is safely collecting and distributing realistic email data. At the TIP meeting, the general tasks to be achieved in the research will be described and participants will do some privacy issue spotting. Particular discussion will relate to possible privacy problems and solutions surrounding the collection and sharing of email messages needed for research purposes. [Presenter: W. Cohen]

  3. World-Wide Reference Resolution
    Reference resolution is the problem of (a) determining that two strings are the same named entity in a particular context and (b) determining all the entities a particular string references in a particular context. For example, in the CMU context, Latanya and Dr. Sweeney refer to the same named entity with high probability. The string Tomasic refers to at least two entities in the CMU context. This meeting considers the question of the construction of an artifact that performs reference resolution for the planet. The purpose of the meeting is to elicit comments on the feasibility of such an artifact and to consider possible algorithms for its implementation. In particular, RosterFinder considers a closely related problem of finding lists of people. The TIP meeting will engage in brainstorming on strategies to accomplish this goal. RosterFinder may be helpful.  (
    RosterFinder is a new algorithm from the Lab that locates lists of people on the Web). [Presenter: A. Tomasic]

  4. Computer Science Research Butts Privacy I
  5. Computer Science Research Butts Privacy II
    Computer Science research and practice are raising growing privacy concerns among the public and government.  Our increasing ability to capture, organize, interpret and share data about individuals raises questions about what we should be doing as a field, and what CMU should do in particular. These issues are already very real in ongoing SCS research projects, from mining databases of individual transactions, to studying how people use the web, to mounting cameras in the lounge, to building hallway robots that capture data about passers by, to building intelligent workstation assistants that learn user habits.

    This forum will address a cluster of questions on this topic, including:

    1. How do we wish to govern ourselves with regard to privacy issues?  Should we suggest our own set of guidelines, or propose changes to the procedures currently in place?
    2. What is the role of CMU's Institute Review Board (IRB) in overseeing CS research that may impact individual privacy?  How can CS researchers best prepare for an IRB review of their research?
    3. What areas of CS research have potential impact (both positive and negative) on privacy?  Are there new research areas we should engage, to directly address privacy issues?
    4. How do these issues faced at CMU generalize to the general computer science community, and what should we do at CMU to set an example for the broader community?

      The first session will be top-down, focusing on the broad landscape: (1) the general privacy regulatory landscape related to CS research; (2) Human Subject regulations  and IRBs ; and, (3) the role of privacy technology in satisfying IRB and HIPAA requirements.  The second session will bottom-up, focusing on three sample research projects as case studies. [Presenter: SCS faculty]

    5. Training Chief Privacy Officers
      Most major corporations and high-tech firms of all sizes have a senior executive-level position named "Chief Privacy Officer."  In general, the privacy officer oversees all ongoing activities related to the development, implementation, maintenance of; and adherence to the organization’s policies and procedures covering the privacy of; and access to, person-specific information in compliance with federal and state laws and the organization’s information privacy practices. Here are three key responsibilities of a chief privacy officer:

      • Performs initial and periodic information privacy risk assessments and conducts related ongoing compliance monitoring activities in coordination with the entity’s other compliance and operational assessment functions.
      • Works with legal counsel and management, key departments, and committees to ensure the organization has and maintains appropriate privacy and confidentiality consent, authorization forms, and information notices and materials reflecting current organization and legal practices and requirements.
      • Oversees, directs, delivers, or ensures delivery of initial and privacy training and orientation to all employees, volunteers, medical and professional staff, contractors, alliances, business associates, and other appropriate third parties

      Qualifications for being a privacy officer increasingly requires certification.  Carnegie Mellon University has an executive certification program already, so extending the program to possibly certify privacy officers seems appropriate.  The question is what training beyond the general executive certification training is needed.  This session explores this question. [Presenter: William Ferguson]

    6. Privacy Rights in the KALI Project
      Information about individuals is currently maintained in many thousands of databases, with much of that information, such as name and address, replicated across multiple databases.  However, this proliferation of personal information raises issues of both privacy for the individual and the accuracy of the information.  Ideally, each individual would own, maintain and control their personal information.  This talk presents the idea of users owning their personal information in the context of the KALI project, an on-going research project at Dalhousie University, Canada. [Presenter: Carrie Gates]

    7. Computer Science Research and the IRB Process
      Federally-funded computer science research is increasingly using real-world, person-specific information. The Human Subjects regulation, to which this research must conform, requires a review by the Institute Review Board prior to conducting the research. But the definitions of data and human subjects and the overall intent of the regulation take on different meanings in the context of computer science research that in the context of medical research in which these regulations originated. In this session, we will examine and discuss these differences. [Presenter: Alex London]

    8. k-Anonymous Messaging
      In this session, we will examine some recent work in the theory group at SCS on a protocol for sending a message in which the sender is anonymous.  The idea relates to k-anonymity, in that there will provably be one of k possible senders.  The ambiguity provides the protection. The k-anonymous messaging protocol will be introduced. Possible extensions to how the protocol may be used for traditional data privacy problems will be explored. [Presenter: Andrew Bortz]

    9. IBM's Enterprise Privacy Authorization Language [EPAL]
      Most consumer-oriented privacy policies (for example those expressed in W3C's P3P standard) are formulated in very broad terms. Enforcing such a policy within an enterprise requires a much more detailed policy.  EPAL, developed by IBM Research, is a formal language for writing enterprise privacy policies to govern data handling practices in IT systems according to fine-grained positive and negative authorization rights.

      The discussion will cover the following topics:

      1. The current environment of privacy policy specification and enforcement, relevant shortcomings of P3P.
      2. An overview and critique of the EPAL specification, and discussion on its use and implementations.
      3. A comparison of the EPAL specification to the PRM work that we have discussed in previous meetings.

      [Presenter: Brian Carini]

Related Data Privacy links



Summer 2003 Data Privacy Laboratory [LIDAP@dataprivacylab.org]