Workshop | Topics: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Other | References | Mailing list | Welcome

Topic 4. Publication and reproducibility of re-identification results and scientific value.

Scientific results must be valid and reproducible and many computer science publications will only publish re-identification results that have solutions (what to do about the re-identification). What should be acceptable guidelines for the researcher engaged in a re-identification experiment? Should he share his data? Must the data be left vulnerable so others can confirm results? How should the data be shared with other researchers? Should publication require a solution?

What are important issues? What are risks and harms? Which issues are most likely to occur, and if they do occur, which are most likely to have significant adverse impact?


Post 1
Publication should not require a solution—publishing the problem allows other
researchers to tackle the solution and leverages greater resources than stifling
publication and keeping the problem to oneself. Confirming results is less important,
and other researchers should focus on transposing methods to new dataset in order to
confirm previous findings, rather than replicating the same experiment. This would
confirm the applicability and generalizability of the original research, which is more
important to the "greater good" of such research.

Post 2
Data sets should be kept private and encrypted; however, should be made available to any proven researcher with IRB approval wishing to reproduce results.
Publications should suggest vulnerabilities and perhaps propose methods of approaching solutions, but not vital.

Post 3
The IRB deals with this fairly efficiently. If people follow IRB correctly, that should
provide the protections you need. Re-identifications are considered exempt right now
because the information is already public, but maybe they should rethink that.

Post 4
The researcher has the responsibility to publish his/her research on re-identified data
even if he/she doesn't have an solution because it will allow other people in the space
to attempt to solve the problem.

Post 5
Publication should not require a solution, if it explicitly links the data to
people. A researcher should share his data but should not leave it vulnerable;
this shouldn't be required to confirm the results.

Post 6
Publication should not require a solution because it poses an undue burden on the
researchers, when they are doing a service in identifying these vulnerabilities in the
first place.

Post 7
This should be subject to the standards of IRB anyway.

Post 8
Is there a way to share data with peer reviewers, but not ultimately publish it, or keep it
delayed under some sort of NDA or embargo? Ideally you could publish the results
without publishing the data, and allow a journal's integrity/reputation vouch for your
results (because its peer reviewers will have seen the data).

Post 9
Yes, if one is going to publish a methods section detailing how to get personal information, then that researcher should also publish a way to protect that information. :)

Post 10
IRB has standards for keeping personally identifying information private and I think this suffices for other researcher to access the data.

Post 11
Perhaps the journal can have a final stage in the review process where it receives the
data and confirms the results, and then the data does not need to be widely shared.

Post 12
The goal of revealing the problem should be finding a solution, but a solution should
not be required for publication. You publish the problem to give the whole academic
community the chance to work on solutions. You shouldn't refuse to inform people that
there is a problem because you don't yet have a solution.

Post 13
Publication should not require a solution. Research is helped by being able to
understand previous missteps. The researcher engaging in re-identification should be
aware of the implications of his or her work, and only include vulnerable information
that he or she has obtained consent for; however, this data should be included when
necessary for understanding the paper.

Re-identified data can be very harmful to the people being re-identified, even if they
have given consent.

Post 14
This might be crazy, but you're supposed to dream big. What if the U.S. / world
had an independent organization tasked with confirming results of studies the
data of which is too sensitive to be made public? That way the results could be
scientifically scrutinized and also the subjects' privacy would be protected. Kind
of wild, but probably worth discussing.



IQSS  |    Data Privacy Lab  |    Silent Spring Institute   |    Northeastern University