Topics in Privacy |
The following schedule and descriptions are tentative. Topics are usually not posted earlier than the week before.
We will start this semester of TIP discussions with a brainstorming overview of two specific TIP-based projects to tackle this semester --namely, (1) privacy tags for databases; and, (2) a whistle-blower website for reporting data issues. We will also have some extended time this week allowing everyone to report on their own recent activities.
How should research contributor identifier registries approach confidentiality?
ORCID aims to solve the international author/contributor name ambiguity problem in scholarly communications by creating a central registry of unique identifiers for individual researchers and an open and transparent linking mechanism between ORCID and other current author ID schemes. These identifiers, and the relationships among them, can be linked to the researcher's output to enhance the scientific discovery process and to improve the efficiency of research funding and collaboration within the research community.
The core service provided by ORCID is a registry of assigned identifiers, along with information corresponding to each authors "public" identity, including their name and institutional affiliation.
ORCID is considering the storage of other information related to individual authors such as their history of institutional affiliations, grants, and publication. This would be used both to maintain the integrity of the core registry (e.g. to aid in the disambiguation of author names) and to offer other services.
In this session, we will brainstorm over privacy and trust issues in academic registries using the ORCID model as an exemplar. See talk #2 above.
This is our concluding session for issue spotting. The academic publishers and developers of ORICID will be present
and pose specific questions. In the prior session, Bob Gelman posed a model of these registries as "academic credit bureaus".
David Abrams will share issues related to this characterization. See talks #2 and #3 above.
The Supreme Court's decision in United States v. Jones concerns warrantless installation of a GPS receiver on a private automobile. In this session, David will briefly review the history of fourth amendment search cases leading up to Jones, then describe the oral argument followed by the specifics of the three concurring opinions finding the installation in Jones in violation of the fourth amendment. Finally, we will discuss the decision in terms of electronic privacy in the future.
For more information, see https://www.blacklinetracking.com/.
Bio: David Abrams is a fellow at the Berkman Center for Internet & Society at Harvard and he is the Program Director for the new first-year Problems and Theories course at Harvard Law School. He received degrees in electrical engineering from M.I.T. a while ago and spent twenty-five years designing hardware and software before going to law school. He is interested in the relationship between law and technology, particularly how to apply both effectively to reduce undesirable behavior.
In this session, we will brainstorm on directions for law in the face of today's data collections and sharing
practices and the Supreme Court's decision in United States v. Jones on GPS Privacy.
MyDataCan™ seeks to be a long-term publicly available online data service that will
serve as a hub for personal data sharing. Members of the public can collect, assemble,
and distribute their own personal data, across disparate data silos, including health information,
without a fee, and optionally elect to participate in activities that use a person's data to
improve the quality of his life. Most of these activities are third-party applications ("apps")
to which a participant personally subscribes. Participants may also be asked to participate in
research, but no research participation is required. MyDataCan™ is a living lab research
project at Harvard. Research aims are various, but in terms of privacy, include developing and
studying notions of "personal access control" and "privacy-preserving marketplaces"
as mechanisms for data sharing, and assessing privacy and privacy governance when data subjects
directly participate in data sharing arrangements. In this talk, I will introduce the privacy
model of MyDataCan™ and discuss its privacy promise.
For more information, see https://mydatacan.org/.
This talk will present the results of two experiments whose primary
goal was to assess the economic value that people attach to their
private information. The private information considered in the first
experiment was the geographic location of the person, which would be
monitored through a mobile phone. The second experiment focused on
information related to the usage of online communication tools (emails
and instant messaging), which would be collected by a proprietary
monitoring software. In both cases, people were asked to bid for the
remuneration they would require for participating in such an
experiment. We estimated the monetary value of private information in
two general scenarios - data collected for academic research and for
commercial purposes. This work was done together with Marek Kumpost,
Claudia Diaz, Sandra Steinbrecher, George Danezis, Stefan Kopsell and
many others within the EU Network of Excellence FIDIS - Future of
Identity in the Information Society.
Bio: Vashek Matyas is a Fulbright-Masaryk Visiting Scholar at the Center
for Research on Computation and Society (CRCS) at Harvard University and
a Professor at the Masaryk University Brno, CZ. His research interests
relate to applied cryptography and security. He worked with Microsoft
Research Cambridge, University College Dublin, Ubilab at UBS AG, and
was a Royal Society Postdoctoral Fellow with the Cambridge University
Computer Lab. Vashek edited the Computer and Communications Security
Reviews, and worked on the development of Common Criteria and with
ISO/IEC JTC1 SC27.
The emergence of the "network public sphere" as a space for political discussion and debate globally is leading to a shift in dialogs about press freedom to dialogs about internet freedom.
The emergence of metrics like Freedom House's Global Assessment of Freedom on the Net raise questions about how we consider the multiple facets of internet freedom and whether these metrics
suffer from the same subjectivities of metrics of press freedom. Ethan Zuckerman will review some of the research on the various restrictions on online speech that are turning the network public sphere into a contested space and suggest possible strategies for data-driven metrics to measure internet censorship and freedom.
Bio: Ethan Zuckerman is director of the Center for Civic Media at MIT, and a principal research scientist at MIT's Media Lab. His research focuses on the distribution of attention in mainstream and new media, the use of technology for international development, and the use of new media technologies by activists. With Rebecca MacKinnon, Ethan co-founded international blogging community Global Voices. Global Voices showcases news and opinions from citizen media in over 150 nations and thirty languages, publishing editions in twenty languages. Through Global Voices and through the Berkman Center for Internet and Society at Harvard University, where he served as a researcher and fellow for eight years, Ethan is active in efforts to promote freedom of expression and fight censorship in online spaces.
The Commerce Department and the White House recently issued a White Paper: "Consumer Data Privacy in a Networked World: A Framework For Protecting Privacy and Promoting Innovation in the Global Digital Economy".
The document proposes a multistakeholder process "to specify how the principles in the Consumer Privacy Bill of Rights apply in particular business contexts." Commerce is moving ahead to implement the multistakeholder process. Bob Gellman will review the current state of play in Washington, the proposal for a Consumer Bill of Rights, and the prospects for progress on privacy. One topic for discussion is whether and how academics might play a role in the multistakeholder process.
Robert Gellman is a privacy and information policy consultant in Washington, D.C., specializing in health confidentiality policy, privacy and data protection,
and Internet privacy. Clients have included federal agencies, Fortune 500 companies, trade associations, advocacy groups, foreign governments, and others.
A graduate of the Yale Law School, Gellman served for 17 years as chief counsel to the Subcommittee on Government Information in the House of Representatives.
He maintains a webpage with many documents and other useful resources at www.bobgellman.com. He is coauthor of ONLINE PRIVACY A Reference Handbook published by ABC-CLIO in 2011.
theDataMap™ promises to be an online portal for documenting flows of personal data.
Its goal is to make data sharing more transparent, and it does so by engaging the public in
collective data sharing discovery. It is scheduled to go live at a big privacy debut in DC
in a couple of weeks. Yet, many final design decisions remain open as parallel designs
have been pursued. Challenges relate to how to design a system that provides the right
incentives for truthful and helpful information to emerge. How do you get information
about data sharing? How do you make sure it is accurate? And how do you achieve this
on a small budget? In this session, we will look at these designs and engage in issue
spotting and brainstorming on all aspects of the project.
For more information, see theDataMap.org.
We will describe and answer questions about a multidisciplinary research project at Harvard on "Privacy for Social Science Research", in preparation for a presentation to NSF. A summary of the project follows:
Information technology, advances in statistical computing, and the deluge of data available through the Internet are transforming social science. With the ability to collect and analyze massive amounts of data on human behavior and interactions, social scientists can hope to uncover many more phenomena, with greater detail and confidence, than allowed by traditional means such as surveys and interviews. In addition to advancing the state of knowledge, the rich analysis of behavioral data can enable companies to better serve their customers, and governments their citizenry.
However, a major challenge for computational social science is maintaining the privacy of human subjects. At present, an individual social science researcher is left to devise her own privacy shields, such as stripping the dataset
of "personally identifiable information" (PII). However, such privacy shields are often ineffective and provide limited or no real-world privacy protection. Indeed, there have been a number of cases where the individuals in a supposedly anonymized dataset have been re-identified. At the same time, social scientists are increasingly analyzing complex forms of data, such as large social networks, spatial trajectories, and semi-structured text, that are even less amenable to naive attempts at anonymization.
Beyond harm that may be suffered by the subjects themselves, such privacy violations are a serious threat to the future of computational social science research. After a few serious and highly publicized incidents, it may become much harder for researchers to obtain good social science data. Subjects may be reluctant to participate in experiments, data holders may become subject to stifling regulation, and companies may refuse to share proprietary data out of fear of lawsuits or bad public relations.
This project is a broad, multidisciplinary effort to help enable the collection, analysis, and sharing of social science data while providing privacy for individual subjects. Bringing together computer science, social science, statistics, and law, the investigators seek to refine and develop definitions and measures of privacy and data utility, and design an array of technological, legal, and policy tools for social scientists to use when dealing with sensitive data.
These tools will be tested and deployed at the Harvard Institute for Quantitative Social Science's Dataverse Network, an open-source digital repository that offers the largest catalogue of social science datasets in the world. Our aim is to provide social scientists with a technological and legal framework that embodies the modern computational understanding of privacy, and a reliable open infrastructure that aids in the management of confidential research data from collection through dissemination.
Presenter and Lead Discussant: Salil Vadhan
HIPAA is often described as a privacy rule. It is not. In fact, HIPAA is a disclosure regulation, and it
has effectively dismantled the longstanding moral and legal tradition of patient confidentiality. By permitting
broad and easy dissemination of patients' medical information, with no audit trails for most disclosures, it has
undermined both medical ethics and the effectiveness of medical care.
In this session, Dr. Sobel will talk about and discuss his recent paper, which is accessible below:
Bio: Richard Sobel explores the relationships between citizens and governments as a Senior Research
Associate in the Program in Psychiatry and the Law at Harvard Medical School, and a Senior Research
Fellow and Policy Director at the Roper Center for Public Opinion Research in Storrs, CT. His work
includes the policy analysis of privacy and confidentiality issues, particularly on constitutional
and political questions about governmental databanks and identification schemes. It also explores
the influence of public opinion on foreign policy in the U.S. and abroad. The privacy and foreign
policy strands became more closely allied in the post-911 era of concerns for how international
issues like anti-terrorism affect the domestic realm, including civil liberties.
A privacy tag is an attribute assigned to a dataset that asserts privacy characteristics about the dataset.
The basic idea is that there is a small dictionary of defined tags, where one or few tags from the dictionary
collectively describe privacy characteristics in a dataset. One of the most successful uses of tags is in
Creative Commons licenses to help people and organizations share creative works openly with attribution for
their intellectual property. The vision of this work is to establish a kind of "privacy commons", which we
currently term, "PrivacyAccord", to help researchers and data collectors share person-specific data openly
while respecting privacy safeguards. A PrivacyAccord agreement gives the data provider the ability to assert
privacy commitments and governing standards and to make sure that everyone who comes into contact with the data
respect and replicate those commitments. For example, if Bob has a copy of Alice's Privacy Accord-licensed data,
he can give a copy to Carol and Carol will be authorized to use the data provided Carol respects the same privacy
commitments made by Alice, as expressed through Alice's original Privacy Accord license.
In this talk, we will brainstorm on privacy tags and the PrivacyAccord approach using IQSS' Dataverse as a real-world
model for possible implementation.
Discussants: Merce Crosas, Micah Altman, Latanya Sweeney
Numerous initiatives around the country are assembling "all-payer databases". These are statewide and national collections of copies of medical claims paid or presented for payment to health insurance companies. Patient medical data (including diagnosis and procedure codes as well as costs and demographics) flow from the insurance company to these databases without explicit patient knowledge or permission. Most initiatives involve a government or special organization housing and controlling access to the data. At least one proposal houses a nationwide collection at an academic facility. In this talk, Dr. Freedman will talk about how we might obtain and use claims data under patient control in MyDataCan and how we might acquire claims data to establish a nationwide medical claims database for research purposes, housed at Harvard.
Bio: John Freedman, MD is the founder and principal of Freedman Healthcare, a small Newton (MA) firm that has extensive experience with All Payer Claims Databases,
and expertise ith Quality of Care, Analytics, ACO issues, and Performance Measurements using these data. freedmanhealthcare.com