|
|
|
![]() |
The Science of Privacy |
Privacy, a Beast with Many Heads
|
![]() |
Over the last few years I have had the unique opportunity as a computer scientist to cross many different areas of scientific pursuits involving privacy. Obviously, I have also spent a great deal of time working with lawyers involved in privacy-related lawsuits, and with policy makers on framing legislation, regulation and policy. Following the announcement of the new medical privacy legislation (HIPAA) in the USA, some of my technologies by way of patent protections were licensed to some companies, so I have also enjoyed the vantage point of deploying privacy technology for commercial use. It has been an exciting few years!
While it is clear that policy makers often do not understand technology, and scientists typically do not understanding the regulatory constraints in which their technology must reside, what is most disturbing is that the scientific communities in pursuit of developing privacy technology are themselves unaware of findings across other scientific communities. They often act in total ignorance, as if there are no other communities pursuing privacy technology. As a result, a global view reveals contradictions across the communities and efforts lost in re-inventing the wheel. What is needed is to move the discussion forward in pursuit of the science of privacy.
Below is a survey of some overlapping scientific communities currently addressing privacy technology.
Biometrics
Related content: methods by which humans are identified in data and methods for detecting and preventing identity theft.
Computer security
Related content: privacy problems from the standpoint of secure communications. Techniques that insulate against eavesdropping, provide secure payments, etc.
Computer theory
Related content: encryption, zero-knowledge proofs, multi-party computations, and the use of trusted third parties.
Database security
Related content: access controls and audit trails to identify intrusions and maintain accountability.
Medical informatics
Related content: privacy protection methods for sharing medical data, where data includes information provided in a wide variety of fomats (e.g., clinical notes, images, DNA sequences, lab results, prescriptions, web logs).
Policy specification and enforcement
Related content: methods to automatically express and enforce privacy policies and preferences. Two threads: digital rights management (DRM) and privacy policy specifications (e.g., P3P) fuel this emerging community, expanding attention into enterprise-wide solutions and notions of privacy rights management.
Privacy-preserving data mining
Related content: privacy protection (primarily by outlier detection and uses of suppression and additive noise) such that data mining algorithms continue to provide reasonable results from protected data.
Statistical disclosure control
Related content: methods to provide tabular and field-structured data such that unusual information is masked or removed while statistical properties in field-structured data remain.
Trustworthy computing
Related content: methods to push security and privacy problems into the programming environment, or alternatively, to hold the modules within the computational environment responsible for prohibiting violations.
Ubiquitous computing and semantic web
Related content: a variant of policy specification and enforcement research (though separately identified) focused on networks of sensors, cameras, and web locations.
Other areas
Privacy technology analyses from Economic, Legal, and Policy perspectives.
As a first effort to develop this as a scientific field, I joined with Michael Shamos and we launched the Journal of Privacy Technology ( www.jopt.org). The idea is not to remove papers from their constituent communities (or "application domains") but to provide a forum in which findings can be shared across existing communities. Results can then be fed to other application domains, thereby completing the knowledge cycle.
As an example of the need for the JOPT, my k-anonymity paper originated from work in the medical computing community -- it addressed the problem on how to share medical records for research purposes with provable guarantees that the patients whose information is contained in the data could not be re-identified.
The formal protection model described is called k-anonymity. The CS theory community, in the development of k-anonymous messaging, claims optimal k-anonymity using generalization and suppression is NP-hard. The statistical disclosure control community in pursuit of ad hoc schemes for anonymizing data for public sharing, considers k-anonymity to be NP-complete. The real-world systems that are commercially available are N2. This is a serious problem, because these NP findings have led to subsequent work that is N3 and worse and to approximations that are weak in terms of the real-world solutions. In many cases, the lack of a forum in which to discuss the circumstances that account for these differences has allowed a tremendous waste of resources.
In the future, I see these scientific communities coming together to explore the science of privacy so that meta-theories and fundamental findings can be shared, and the JOPT will be an instrument to make that happen. In order to establish the editorial boards of the JOPT, we surveyed the privacy work in each of the communities and took the top 3 names in each community. Two of them were asked to join the JOPT Editorial Board and one was asked to join the JOPT Advisory Board. Before the JOPT, there was no place where work on the first principles of the science of privacy could even be published! Now, the JOPT will be instrumental in pioneering this new area.
Latanya Sweeney
March 2004.
Tell me more about: