Workshop | Topics: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Other | Links | Mailing list | Welcome

Welcome by Latanya Sweeney

Thank you all for attending this Workshop on the Ethics of Re-identification. I apologize that I am unable to be with you in person, to see your familiar faces, to welcome you and exchange hugs, and to make new friends.

My mother passed earlier this week and her services are tomorrow (Saturday). So my plan was to take the last flight from Nashville to Boston last night and then return immediately after today's Workshop. But that flight was cancelled making it impractical for me to attend and return on a timely basis. I do leave you in good hands, as I am sure Julia Brody will introduce you to our team members.

The time is ripe for re-identification to attempt to emerge as a field. Knowing actual risks and ways personal information may be vulnerable can ignite effective remedies most appropriate for today's data rich networked society and the multitude of diverse data sharing environments. Understanding real-world risks enables practical solutions. The promise of re-identification is to find meaningful ways for sharing personal information widely and not leave personal information blindly vulnerable.

Computer security enjoyed the same cycle we need. At first, encryption was done by ad hoc methods, whatever looked good. As vulnerabilities were discovered and published, better encryption schemes emerged, until eventually computer scientists developed the forms of strong encryption we enjoy today. This is our vision for re-identification.

The cycle begins by exposing a vulnerability, which fuels a better solution, which in turn, leads to exposing more vulnerabilities, which yield better and better solutions, and so on.

The time is ripe for the study of re-identification to emerge. First, there is substantial financial interest in opposition to better understanding re-identification risks because companies have found lucrative revenue streams that exploit vulnerabilities. This trend will continue until eventually the money in favor of the exploits may make correction politically impossible. Second, there is so much more personal information being collected and shared today that we need better scientific knowledge about actual risks and remedies. If we do not not acquire the necessary knowledge, society will assume no such options are possible. Third, k-anonymity and differential privacy illustrate that some forms of technical remedies are possible but they each fail to capture the full spectrum of data sharing needs. In fact, it is highly unlikely that there is one magic solution, as much as a suite of solutions, appropriate for different data sharing environments.

My own history of re-identification has been amazing in hindsight. My earliest result, the Weld example, has been heavily cited and underscores how unique we are based on our demographics --specifically, month, day and year of birth, gender, and 5-digit ZIP (postal code). At the time, the standard was to share personal data with those demographics, and as long as the information did not have the person's name, it was not considered identifiable. William Weld, then Governor of Massachusetts had medical data released that did not have his name, but did contain his demographics, which were unique. By linking the medical data to voter data on date of birth, gender, and ZIP, his medical information could be re-identified. [We have a server for you to test your own identifiability of these demographics (here).] Despite the broad popularity of the Weld example, and its profound impact on privacy policy, there was no academic publication interested in publishing the result or the broader study on the uniqueness of demographics. This gives us insight #1, publication is difficult.

Then, I performed a re-identification of a popular dataset released by the U.S. Bureau of the Census. When the re-identifications were revealed to the Bureau, they not only refused to pay the promised grant funds, but they also asked that I not release the findings. I complied but even several years later, they continued to release the data in the same, original vulnerable format. This gives us insight #2, results can be disruptive and data holders may want to silence results and take no corrective action.

I was asked to see is I could re-identify children on a cancer registry in Illinois. I submitted the names to the Department of Public Health to score, and when the names were found to be correct, a judge issued a court order banning me from ever disclosing how I did it. This gives us insight #3, data holders may seek legal maneuvers to silence results.

My experiences go on in this vein, with great difficulty publishing results. Most recently, others began to question whether there are any vulnerabilities at all, and in fact, industry requested privacy standards be lowered because of the lack of evidence.

What about on the data protection side. There are certainly lots of academic papers om ways to protect personal information. Recently, Khaled El Emam de-identified a medical dataset for use in a public contest. He published a paper about his method and claimed that his approach was sufficient for public use. I then asked him if he would score my results if I attempted to re-identify the people in the data. My request was met with threats of lawsuits. So, his paper stands unchallenged and the data shared widely without question. This gives us insight #4, if data protection schemes cannot be challenged, we cannot scientifically attest to their viability.

Now is the time to grow re-identification as a field, to perform re-identifications and to develop effective solutions that move society forward in protecting personal information as personal data are shared widely for many worthy purposes and in many different contexts. To help us launch, we have already begun a series of new re-identification experiments. Our re-identification of the Personal Genome Project will be talked about later (more, PDF), and we have many more planned. But we recognize that re-identifications can be disruptive. For this reason, we need guidance, which is where this Workshop comes in. We hope you will help us brainstorm on issues and move towards best practices.

Throughout the day you can post a comment on any of the workshop topics, share references and links, and sign up for our mailing list.

Visit the Workshop webpage.

Thank you again for attending.



Post a response. Enter your text below and submit. Your name or email address is optional.


IQSS  |    Data Privacy Lab  |    Silent Spring Institute   |    Northeastern University