Topics in Privacy

Topics in Privacy (TIP) consists of weekly discussions and brainstorming sessions on all aspects of privacy. Discussions are often inspired by a real-world privacy problem being faced by the lead discussant, who may be from industry, government, or academia. Practice talks and presentations on specific techniques and topics are also common.

The following schedule and descriptions are tentative. Topics are usually not posted earlier than the week before.

Schedule Spring 2012

Date  Speaker/Discussant  Topic
3/14  group discussion  Semester Launch
3/21  Bernard Rous
ACM Director of Publications
  Confidentiality of Researcher Registries
3/28  group discussion  Issue spotting: Academic Publication Registries
4/4  Bernard Rous and David Abrams  Academic Publication Registries as Academic Credit Bureaus
4/11  David Abrams
Harvard Law School
  Supreme Court Decision in the Jones Case on GPS Privacy
4/18  group discussion  Issue Spotting: The Jones GPS Privacy Case
4/25  Latanya Sweeney
Data Privacy Lab, Harvard
  MyDataCan™: Your Data in Your Hands
5/2  Vashek Matyas
Masaryk University Brno, CZ
  Monetary Valuation of Private Information
5/9  Ethan Zuckerman, MIT  Possible Metrics for Internet Freedom - Potentials and Challenges
5/16  Robert Gellman, Privacy and Information Policy Consultant  Current Activities in Privacy: the Department of Commerce and the New Multistakeholder Process
5/23  Latanya Sweeney
Data Privacy Lab, Harvard
5/30  Salil Vadhan
Vicky Professor of Computer Science and Applied Mathematics, Harvard
  Privacy for Social Science Research
6/6  Richard Sobel
Harvard Medical School
  The HIPAA Paradox
6/13  Merce Crosas, Micah Altman, Latanya Sweeney  PrivacyAccord: Towards a Creative Commons for Data Sharing
6/20  John Freedman MD
Freedman Healthcare
  Nationwide Medical Claims Database under Patient Control

Abstracts of Talks and Discussions

  1. Semester Launch

    We will start this semester of TIP discussions with a brainstorming overview of two specific TIP-based projects to tackle this semester --namely, (1) privacy tags for databases; and, (2) a whistle-blower website for reporting data issues. We will also have some extended time this week allowing everyone to report on their own recent activities.

  2. Confidentiality of Researcher Registries

    How should research contributor identifier registries approach confidentiality?

    ORCID aims to solve the international author/contributor name ambiguity problem in scholarly communications by creating a central registry of unique identifiers for individual researchers and an open and transparent linking mechanism between ORCID and other current author ID schemes. These identifiers, and the relationships among them, can be linked to the researcher's output to enhance the scientific discovery process and to improve the efficiency of research funding and collaboration within the research community.

    The core service provided by ORCID is a registry of assigned identifiers, along with information corresponding to each authors "public" identity, including their name and institutional affiliation.

    ORCID is considering the storage of other information related to individual authors such as their history of institutional affiliations, grants, and publication. This would be used both to maintain the integrity of the core registry (e.g. to aid in the disambiguation of author names) and to offer other services.

    • What information about authors and publications should be considered unambiguously "public"?
      • Does the proposed ORCID Profile Element set pose any major problems in this regard?

    • What are potential sensitive issues related to publications and author data?
      • For example, a registry that included the author's professional affiliation could be used to construct an employment history. Should this be viewed as private sensitive information?

    • ORCID expects institutions to deposit profiles on behalf of their faculty in Phase 1.
      • What permission if any does the institution need to have in order to create the profiles in the first place?
      • What permission if any does the institution need in order to deposit them in ORCID? (given our particular element set)
      • Would a university in the EU be able to compile faculty profiles with or without permission? Would it be able to deposit them in ORCID?
      • How can the registry provide for linking with institutional records, and support authors ability to "claim" publications and to correct their publication record, while minimizing the private sensitive information collected?

    • ORCID has a principle that all profiles created by a researcher or "claimed" by that researcher, will be available free of charge to the public once a year with no constraints in its usage other than not duplicating the ORCID service.
      • For institutionally deposited profiles that are not claimed, who can legally see and use these profiles?
      • Who controls what elements are visible and sharable? Must the profiles remain hidden unless they are first "claimed" by a researcher?
      • Who controls who gets to see the profiles?

    • ORCID has a principle that researchers control their own privacy settings at the granular level of each element, for both the profiles they create and those they claim.
      • Is such granularity necessary and complexity necessary? Can you suggest simplifications?
      • Can ORCID mandate that some elements be public? (Obviously, this is most important for institutionally deposited profiles, because the researcher can simply not enter elements that he or she wants hidden.)

    • ORCID has built three privacy control settings: private, public, and protected. The definition of "protected" is that the researcher determines who can see the protected elements.
      • Given the ORCID profile element set, does this "protected" setting make sense?
      • Is this setting required for the good will of the researcher community?
      • Does it make sense just to have two privacy control settings: public or private?

    • In addition to privacy control settings, should researchers be given total edit control over all elements (or some elements) in a deposited profile they claim as their own, including add, change, and delete?

    • What are the legal concerns for ORCID itself to collect publicly available data and create profiles?

    • What sort of security policy is required of ORCID for it to be a trusted organization?

  3. Issue spotting: Academic Publication Registries

    In this session, we will brainstorm over privacy and trust issues in academic registries using the ORCID model as an exemplar. See talk #2 above.

  4. Academic Publication Registries as Academic Credit Bureaus

    This is our concluding session for issue spotting. The academic publishers and developers of ORICID will be present and pose specific questions. In the prior session, Bob Gelman posed a model of these registries as "academic credit bureaus". David Abrams will share issues related to this characterization. See talks #2 and #3 above.

  5. Supreme Court Decision in the Jones Case on GPS Privacy

    The Supreme Court's decision in United States v. Jones concerns warrantless installation of a GPS receiver on a private automobile. In this session, David will briefly review the history of fourth amendment search cases leading up to Jones, then describe the oral argument followed by the specifics of the three concurring opinions finding the installation in Jones in violation of the fourth amendment. Finally, we will discuss the decision in terms of electronic privacy in the future.

    For more information, see

    Bio: David Abrams is a fellow at the Berkman Center for Internet & Society at Harvard and he is the Program Director for the new first-year Problems and Theories course at Harvard Law School. He received degrees in electrical engineering from M.I.T. a while ago and spent twenty-five years designing hardware and software before going to law school. He is interested in the relationship between law and technology, particularly how to apply both effectively to reduce undesirable behavior.

  6. Issue Spotting: The Jones GPS Privacy Case

    In this session, we will brainstorm on directions for law in the face of today's data collections and sharing practices and the Supreme Court's decision in United States v. Jones on GPS Privacy.

  7. MyDataCan™: Your Data in Your Hands

    MyDataCan™ seeks to be a long-term publicly available online data service that will serve as a hub for personal data sharing. Members of the public can collect, assemble, and distribute their own personal data, across disparate data silos, including health information, without a fee, and optionally elect to participate in activities that use a person's data to improve the quality of his life. Most of these activities are third-party applications ("apps") to which a participant personally subscribes. Participants may also be asked to participate in research, but no research participation is required. MyDataCan™ is a living lab research project at Harvard. Research aims are various, but in terms of privacy, include developing and studying notions of "personal access control" and "privacy-preserving marketplaces" as mechanisms for data sharing, and assessing privacy and privacy governance when data subjects directly participate in data sharing arrangements. In this talk, I will introduce the privacy model of MyDataCan™ and discuss its privacy promise.

    For more information, see

  8. Monetary Valuation of Private Information

    This talk will present the results of two experiments whose primary goal was to assess the economic value that people attach to their private information. The private information considered in the first experiment was the geographic location of the person, which would be monitored through a mobile phone. The second experiment focused on information related to the usage of online communication tools (emails and instant messaging), which would be collected by a proprietary monitoring software. In both cases, people were asked to bid for the remuneration they would require for participating in such an experiment. We estimated the monetary value of private information in two general scenarios - data collected for academic research and for commercial purposes. This work was done together with Marek Kumpost, Claudia Diaz, Sandra Steinbrecher, George Danezis, Stefan Kopsell and many others within the EU Network of Excellence FIDIS - Future of Identity in the Information Society.

    Bio: Vashek Matyas is a Fulbright-Masaryk Visiting Scholar at the Center for Research on Computation and Society (CRCS) at Harvard University and a Professor at the Masaryk University Brno, CZ. His research interests relate to applied cryptography and security. He worked with Microsoft Research Cambridge, University College Dublin, Ubilab at UBS AG, and was a Royal Society Postdoctoral Fellow with the Cambridge University Computer Lab. Vashek edited the Computer and Communications Security Reviews, and worked on the development of Common Criteria and with ISO/IEC JTC1 SC27.

  9. Possible Metrics for Internet Freedom - Potentials and Challenges

    The emergence of the "network public sphere" as a space for political discussion and debate globally is leading to a shift in dialogs about press freedom to dialogs about internet freedom. The emergence of metrics like Freedom House's Global Assessment of Freedom on the Net raise questions about how we consider the multiple facets of internet freedom and whether these metrics suffer from the same subjectivities of metrics of press freedom. Ethan Zuckerman will review some of the research on the various restrictions on online speech that are turning the network public sphere into a contested space and suggest possible strategies for data-driven metrics to measure internet censorship and freedom.

    Bio: Ethan Zuckerman is director of the Center for Civic Media at MIT, and a principal research scientist at MIT's Media Lab. His research focuses on the distribution of attention in mainstream and new media, the use of technology for international development, and the use of new media technologies by activists. With Rebecca MacKinnon, Ethan co-founded international blogging community Global Voices. Global Voices showcases news and opinions from citizen media in over 150 nations and thirty languages, publishing editions in twenty languages. Through Global Voices and through the Berkman Center for Internet and Society at Harvard University, where he served as a researcher and fellow for eight years, Ethan is active in efforts to promote freedom of expression and fight censorship in online spaces.

  10. Current Activities in Privacy: the Department of Commerce and the New Multistakeholder Process

    The Commerce Department and the White House recently issued a White Paper: "Consumer Data Privacy in a Networked World: A Framework For Protecting Privacy and Promoting Innovation in the Global Digital Economy". The document proposes a multistakeholder process "to specify how the principles in the Consumer Privacy Bill of Rights apply in particular business contexts." Commerce is moving ahead to implement the multistakeholder process. Bob Gellman will review the current state of play in Washington, the proposal for a Consumer Bill of Rights, and the prospects for progress on privacy. One topic for discussion is whether and how academics might play a role in the multistakeholder process.

    Robert Gellman is a privacy and information policy consultant in Washington, D.C., specializing in health confidentiality policy, privacy and data protection, and Internet privacy. Clients have included federal agencies, Fortune 500 companies, trade associations, advocacy groups, foreign governments, and others. A graduate of the Yale Law School, Gellman served for 17 years as chief counsel to the Subcommittee on Government Information in the House of Representatives. He maintains a webpage with many documents and other useful resources at He is coauthor of ONLINE PRIVACY A Reference Handbook published by ABC-CLIO in 2011.

  11. Designing

    theDataMap™ promises to be an online portal for documenting flows of personal data. Its goal is to make data sharing more transparent, and it does so by engaging the public in collective data sharing discovery. It is scheduled to go live at a big privacy debut in DC in a couple of weeks. Yet, many final design decisions remain open as parallel designs have been pursued. Challenges relate to how to design a system that provides the right incentives for truthful and helpful information to emerge. How do you get information about data sharing? How do you make sure it is accurate? And how do you achieve this on a small budget? In this session, we will look at these designs and engage in issue spotting and brainstorming on all aspects of the project.

    For more information, see

  12. Privacy for Social Science Research

    We will describe and answer questions about a multidisciplinary research project at Harvard on "Privacy for Social Science Research", in preparation for a presentation to NSF. A summary of the project follows:

    Information technology, advances in statistical computing, and the deluge of data available through the Internet are transforming social science. With the ability to collect and analyze massive amounts of data on human behavior and interactions, social scientists can hope to uncover many more phenomena, with greater detail and confidence, than allowed by traditional means such as surveys and interviews. In addition to advancing the state of knowledge, the rich analysis of behavioral data can enable companies to better serve their customers, and governments their citizenry.

    However, a major challenge for computational social science is maintaining the privacy of human subjects. At present, an individual social science researcher is left to devise her own privacy shields, such as stripping the dataset of "personally identifiable information" (PII). However, such privacy shields are often ineffective and provide limited or no real-world privacy protection. Indeed, there have been a number of cases where the individuals in a supposedly anonymized dataset have been re-identified. At the same time, social scientists are increasingly analyzing complex forms of data, such as large social networks, spatial trajectories, and semi-structured text, that are even less amenable to naive attempts at anonymization.

    Beyond harm that may be suffered by the subjects themselves, such privacy violations are a serious threat to the future of computational social science research. After a few serious and highly publicized incidents, it may become much harder for researchers to obtain good social science data. Subjects may be reluctant to participate in experiments, data holders may become subject to stifling regulation, and companies may refuse to share proprietary data out of fear of lawsuits or bad public relations.

    This project is a broad, multidisciplinary effort to help enable the collection, analysis, and sharing of social science data while providing privacy for individual subjects. Bringing together computer science, social science, statistics, and law, the investigators seek to refine and develop definitions and measures of privacy and data utility, and design an array of technological, legal, and policy tools for social scientists to use when dealing with sensitive data.

    These tools will be tested and deployed at the Harvard Institute for Quantitative Social Science's Dataverse Network, an open-source digital repository that offers the largest catalogue of social science datasets in the world. Our aim is to provide social scientists with a technological and legal framework that embodies the modern computational understanding of privacy, and a reliable open infrastructure that aids in the management of confidential research data from collection through dissemination.

    Presenter and Lead Discussant: Salil Vadhan

  13. The HIPAA Paradox

    HIPAA is often described as a privacy rule. It is not. In fact, HIPAA is a disclosure regulation, and it has effectively dismantled the longstanding moral and legal tradition of patient confidentiality. By permitting broad and easy dissemination of patients' medical information, with no audit trails for most disclosures, it has undermined both medical ethics and the effectiveness of medical care.

    In this session, Dr. Sobel will talk about and discuss his recent paper, which is accessible below:

    Richard Sobel, "The HIPAA Paradox: The Privacy Rule That's Not," Hastings Center Report 37, no. 4 (2007): 40-50. (PDF)

    Bio: Richard Sobel explores the relationships between citizens and governments as a Senior Research Associate in the Program in Psychiatry and the Law at Harvard Medical School, and a Senior Research Fellow and Policy Director at the Roper Center for Public Opinion Research in Storrs, CT. His work includes the policy analysis of privacy and confidentiality issues, particularly on constitutional and political questions about governmental databanks and identification schemes. It also explores the influence of public opinion on foreign policy in the U.S. and abroad. The privacy and foreign policy strands became more closely allied in the post-911 era of concerns for how international issues like anti-terrorism affect the domestic realm, including civil liberties.

  14. PrivacyAccord: Towards a Creative Commons for Data Sharing

    A privacy tag is an attribute assigned to a dataset that asserts privacy characteristics about the dataset. The basic idea is that there is a small dictionary of defined tags, where one or few tags from the dictionary collectively describe privacy characteristics in a dataset. One of the most successful uses of tags is in Creative Commons licenses to help people and organizations share creative works openly with attribution for their intellectual property. The vision of this work is to establish a kind of "privacy commons", which we currently term, "PrivacyAccord", to help researchers and data collectors share person-specific data openly while respecting privacy safeguards. A PrivacyAccord agreement gives the data provider the ability to assert privacy commitments and governing standards and to make sure that everyone who comes into contact with the data respect and replicate those commitments. For example, if Bob has a copy of Alice's Privacy Accord-licensed data, he can give a copy to Carol and Carol will be authorized to use the data provided Carol respects the same privacy commitments made by Alice, as expressed through Alice's original Privacy Accord license. In this talk, we will brainstorm on privacy tags and the PrivacyAccord approach using IQSS' Dataverse as a real-world model for possible implementation.

    Discussants: Merce Crosas, Micah Altman, Latanya Sweeney

  15. Nationwide Medical Claims Database under Patient Control

    Numerous initiatives around the country are assembling "all-payer databases". These are statewide and national collections of copies of medical claims paid or presented for payment to health insurance companies. Patient medical data (including diagnosis and procedure codes as well as costs and demographics) flow from the insurance company to these databases without explicit patient knowledge or permission. Most initiatives involve a government or special organization housing and controlling access to the data. At least one proposal houses a nationwide collection at an academic facility. In this talk, Dr. Freedman will talk about how we might obtain and use claims data under patient control in MyDataCan and how we might acquire claims data to establish a nationwide medical claims database for research purposes, housed at Harvard.

    Bio: John Freedman, MD is the founder and principal of Freedman Healthcare, a small Newton (MA) firm that has extensive experience with All Payer Claims Databases, and expertise ith Quality of Care, Analytics, ACO issues, and Performance Measurements using these data.

Last Semester

Fall 2011

Copyright © 2012-2014. President and Fellows Harvard University.   |   IQSS   |    Data Privacy Lab   |    []