Personal Genome Project

  Check yourself

  What to do

  Unique You

  Edit CCR

  How we did it

Do we know who you are?

Before the GET 2013 conference, we (members of The Data Privacy Lab at Harvard University) matched full names and addresses to some of the publicly available profiles of participants in the Personal Genome Project. We correctly identified more than 200 PGP participants from a qualified sample, being 84 to 97 percent accurate for those profiles to which we confidently assigned names.

Our work informs a PGP participant whether we could learn his name and address from information on his profile. Using our online servers, you can check yourself or others anytime. If you want to change your CCR, you can edit it anytime. See section, "What You Can Do", below.

We will be at the GET Conference 2013 in Boston. So if you are a PGP participant, drop by our table and see if we correctly identified you to your profile. We can tell you how we did it and what you can do to stop others from doing the same while still keeping your profile useful for research.

Check Yourself

Use our identifiability server https://aboutmyinfo.org/ to determine how unique you are (and therefore how easy it is to identify you) from your ZIP code, date of birth, and gender. Then determine what you should do to reduce your chances of being re-identified. (Anyone can check themselves, even if you are not a participant in the PGP.)

What You Can Do

You can change the values in the fields that made this approach successful: date of birth or ZIP code. By making these values less specific, it becomes harder to link your name to the profile. Also, you can remove your name from appearing explicitly in documents you upload. More help on each of these appears below.

ZIP Code. Log into your profile and select View or edit your public profile. Then scroll down to Geographic Information and select Edit. Reduce your ZIP code to the first three digits by replacing the last 2 with 'xx' (ie 02138 -> 021xx) or delete it altogether.

Birthdate. Use our CCR (Continuity of Care Record) editor at https://mydatacan.hmdc.harvard.edu/pgp/ to change your date of birth to report only year of birth or remove it altogether in the CCR file you upload to the PGP. The CCR file is the file you uploaded to populate your PGP profile, referred to on your profile as a PHR (personal health record). Login to your account and download your current CCR file, edit the date of birth using the editor at the link above, and then upload the modified file to PGP. The PGP system will then re-populate your profile with the modified date of birth entry.

Embedded Names Check the files you have uploaded for your name appearing in the filename or in the file itself. If you find your name, download the file to your computer. If your name appears in the filename, change the filename (removing your name). If your name appears within the document, load the appropriate program (e.g., a text editor for text files) and remove your name from the file. Once you have corrected the file, delete the file from your profile on PGP and upload the modified file back to your profile.

How We Did It

There were two methods we utilized to match participants profiles to their names. The first was from demographic information available on most participants' public profile. In 2011 we mined 600 names from the PGP project that listed the ZIP code, date of birth and gender of the participant. We then matched that information with voter lists and other public records to learn the person's name, address, telephone, etc. to their profile.

The other way we re-identified people was looking at records participants uploaded to their profile, such as from 23andme. Some of those records had the participants names embedded somewhere in the file.

For details, see our paper at dataprivacylab.org/projects/pgp/1021-1.pdf

Contact Us

Visit our table at the GET 2013 conference. Send our director email at latanya@mit.edu or follow @LatanyaSweeney on twitter.

People (alphabetical)

Akua Abu
Sean Hooley
Latanya Sweeney
Julia Winn


Related Projects in the Data Privacy Lab

Identifiability
Re-identification, Trails
Re-identification, DNA
Genomic Privacy
Genomic System Evaluation
De-identification, Datafly
De-identification, Faces
De-identification, Text
De-identification, Utility
(more)


Copyright © 2013. President and Fellows Harvard University.   |   IQSS   |    Data Privacy Lab