References to Privacy-Preserving Data Mining Literature

Privacy-Preserving Data Mining

Data mining techniques are used to find patterns in large databases of information. But sometimes these patterns can reveal sensitive information about the data holder or individuals whose information are the subject of the patterns. The notion of privacy-preserving data mining is to identify and disallow such revelations as evident in the kinds of patterns learned using traditional data mining techniques. Below is a list of key and a list of supporting publications found in the computer science literature. (If you have an additional citation you deem essential to this collection, please let us know.)

Key References

A. Evfimievski, J. E. Gehrke, and R. Srikant. Limiting Privacy Breaches in Privacy Preserving Data Mining. In Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 2003). San Diego, CA. June 2003. The authors define privacy breach as a property by which some private information of a client can be found by the server with high probability. They provide a new technique "Amplification" which ensures limitation on privacy breaches which occur using normal randomization technique while protecting privacy. The authors claim to produce high quality limitation of privacy breach without the information regarding the distribution of the data.
A. Evfimievski, R. Srikant, R. Agrawal and J. Gehrke. Privacy Preserving Mining of Association Rules. Proc. of 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD). July 2002. "Uniform" randomization of the data could preserve privacy in data, but they could be exploited to find information in the data. So here they have shown few randomization techniques (e.g. cut and paste randomization) which could help preserving data better than the uniform randomization techniques.
B. Pinkas. Cryptographic techniques for privacy preserving data mining. SIGKDD Explorations, 4(2). Dec. 2002. They consider the existence of data in a distributed environment rather than a central repository. They show that it is difficult to design an implementation for multi-party construction than two-party construction. They have used the ID3 algorithm for construction of the decision trees for classification of the data in the dataset.
C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin and M. Y. Zhu. Tools for Privacy Preserving Distributed Data Mining. In SIGKDD Explorations, 4(2). 28-34 December 2002. In this paper the authors believe that a toolkit of privacy preserving computations can be helpful in designing Data mining techniques. They provide the components of the toolkit, techniques like Secure sum (computes sum securely), Secure set union (union is created and shared securely), Secure size of intersection (intersection of datasets are created and shared securely) and Scalar product (calculating scalar product of two vectors securely). They also present various applications for these privacy preserving data mining solutions.
C. Farkas and S. Jajodia. The Inference Problem: A Survey. In SIGKDD Explorations, 4(2). 6-11, December 2002. The authors provide a survey of current and emerging research in data inference control. They provide reviews on the inference problem, in general purpose databases, data mining and web based applications. They also relate the inference control problem to secure communication and mobile-computing.
D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Santa Barbara, California, USA. May 21-23 2001. ACM. The authors show that there is a natural loss of information during distribution reconstruction process from the perturbed data which is created to preserve privacy. Here they provide Expectation Maximization (EM) algorithm, which they show is better than the other available techniques. They show that EM algorithm converges to Maximum-Likelihood estimate of the original distribution, thereby preserving privacy better.
D. E. O'Leary. Some Privacy Issues in Knowledge Discovery: The OECD Personal Privacy Guidelines. In IEEE Expert, v.10, n.2. April 1995. pp.48-52. In this paper the author discuss about the legal systems influencing the knowledge discovery from the databases. The author provides a brief note on various OECD (Organization for Economic Cooperation and Development) principles and relates them to the Knowledge discovery (Data mining). The author also provides the differences between the knowledge discovery of the individual against group information.
H. Kargupta, S. Datta, Q. Wang,and K. Sivakumar. On the Privacy Preserving Properties of Random Data Perturbation Techniques. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03). Melbourne, Florida, USA. November 2003. pp. 99-106. The authors in this paper questions the usage of the randomization technique for privacy preserving in data mining techniques, they show that randomization does not help completely in preserving privacy. They show that original data could be retireved from the randomized dataset. They also provide explicitly the assumptions made in preserving privacy in exisiting systems and provide ways by which a general framework could be provided for preserving privacy better.
I. Dinur, K. Nissim. Revealing information while preserving privacy. Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. California. 2003. pp.202–210. For protecting privacy the authors specify that the designer of the privacy preserving system has to provide a balance between hiding the privacy functions or revealing the information functions to the end user. They show various situations and kinds of queries provided to the database for retrieval of information (E.g. adversary issuing all possible queries to find information from a small database). They also discuss privacy preserving in the case of a time / query bounded adversary and provide algorithms where the perturbation magnitude is directly dependent on time or query.
J. Vaidya and C. Clifton. Privacy Preserving Association Rule Mining in Vertically Partitioned Data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, AB, Canada. July 2002. pp 639-644. The authors provide a privacy preserving association rule mining algorithm, which they show works efficiently while the data is distributed across many locations. They show that with reasonable communication cost, one could achieve good privacy protection in distributed data setting.
M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. In The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'02). June 2 2002. In this paper authors discuss about the cryptographic techniques to minimize the information shared, while adding little overhead to the mining task. They address the issue with the help of a scenario; some parties were allowed to access some of the data. Whereas, other existing techniques were using a scenario in which the values were kept private from anybody who is performing the mining to show the efficiency of their algorithm.
M. Kantarcioglu, J. Jin, and C. Clifton. When Do Data Mining Results Violate Privacy? In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004). Seattle. WA, USA. August 2004. Privacy Preserving Data Mining techniques are used to protect results protecting privacy of the data in the datasets. This paper explores the area of "Do the results themselves violate privacy?" This paper presents methods and matrices in evaluating various Privacy Data Mining techniques.
M. Kantarcioglu and J. Vaidya. An Architecture for Privacy-preserving Mining of Client Information. In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining. Maebashi City, Japan. December 2002. pp.37-42. In this, authors have considered the databases as sites and provided three different sites which form the architecture for privacy preserving. They are: 1.Original Site (OS) where the information is collected and also will learn about the results, 2. Non-Colluding Storage Site (NSS) storing the shared part of user information and 3.Processing Site (PS) for performing the Data mining. They show that OS could not differentiate between users, NSS could not learn any information and PS will learn only the aggregate information in their architecture in the process of preserving privacy in the data.
N. Zang, S. Wang, and W. Zhao. A New Scheme on Privacy Preserving Association Rule Mining. In Proceedings of the 15th European Conference on Machine Learning (ECML) and the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD). Pisa, Italy. September 2004. In the paper the authors show that randomization is not the only way that privacy can be preserved in the data, they introduce algebraic methods for preserving privacy in the data. They show that their technique discloses private transaction information about five times less than the previous approaches. They use error of support of frequent item sets (false drops and false positives) as accuracy metrics and privacy breach as privacy metrics in measuring the accuracy of their system.
R. Agrawal and R. Srikant. Privacy-Preserving Data Mining. In Proceedings of the ACM SIGMOD Conference on Management of Data. Dallas, Texas. May 2000. pp.439-450. The authors have asked the question "can we develop accurate models without access to precise information in individual data records?", which has created a new area of research; preserving privacy in data mining procedures. They reconstruct the distribution and not the individual records from the database which has been perturbed with a randomized function for protecting privacy. By developing two techniques By Class and Local, they show that it would not be possible to obtain the true value with a small drop in accuracy, which they say it to be a desirable trade-off for privacy in many situations.
S. Agrawal, V. Krishnan and J. R. Haritsa. On Addressing Efficiency Concerns in Privacy-Preserving Mining. In Proceedings of the 9th International Conference on Database Systems for Advanced Applications (DASFAA-2004). Jeju Island, Korea. March 2004. People are scared to provide personal information while using websites as they feel the organization would misuse the information. To increase the confidence of the user, a system called MASK (Mining Associations with Secrecy Konstraints) was developed, where the information can be distorted at the user end using a simple probabilistic distribution instead of any third-party or the organization doing the same. They show that the efficiency of the Privacy Preserving Data Mining can be well with an order or magnitutde with respect to data mining by maintaining a satisfactory level of privacy and accuracy.
S. R. M. Oliveira and O. R. Zaļane. Toward Standardization in Privacy-Preserving Data Mining. In Proceedings of the 3rd. Workshop on Data Mining Standards (DM-SSP 2004), in conjunction with KDD 2004. Seattle, WA, USA. August, 2004. This paper discusses some of the required future work, where they provide steps in standardizing the Privacy Preserving Data Mining techniques. They analyze the implications of Organization for Economic Cooperation and Development (OECD) data privacy principles and propose some requirements for the development and deployment of solutions. The authors in this paper provide an overview on adopting the OECD principles for providing privacy preserving techniques in datasets.
V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza, Y. Saygin, Y. Theodoridis. State-of-the-art in Privacy Preserving Data Mining. In SIGMOD Record, 33(1): 50-57. March 2004. Authors in this paper provide a survey of the existing privacy preserving data mining techniques. They classify the techniques based on the following dimensions: data distribution, data modification, data mining algorithm, data or rule hiding and privacy preservation. They define a parameter "transversal endurance" which is used to evaluate the sanitization algorithms designed for various privacy preserving techniques in different databases.
Verykios, V.S. Elmagarmid, A.K. Bertino, E.; Saygin, Y. Dasseni, E. Association rule hiding Knowledge and Data Engineering. IEEE Transactions on Knowledge and Data Engineering, Volume: 16 , Issue: 4. April 2004. pp. 434 – 447. In this paper the authors provide two approaches: 1.Hiding the frequent sets to prevent the rules from being generated and 2.Reducing the importance of the rules by keeping the confidence below a threshold value. They provide five algorithms that are built on these two approaches. These strategies or algorithms perform minimal perturbation on the data values in the data set.
Y. Lindell and B. Pinkas. Privacy Preserving Data Mining. In Proceedings of CRYPTO 2000, LNCS 1880, Springer-Verlag. Santa Barbara, CA. August 2000. pp.36-54. In this work the authors discuss about data mining algorithm to be used for analysis from union of two confidential databases but none of each of the database entity wants to share any information with the other database. Using ID3 algorithm they show that among the parties interacting for data analysis, no party can learn anything with respect to the data other than output.
Y. Saygin, V. S. Verykios, and A. K. Elmagarmid. Privacy Preserving Association Rule Mining. In Proceedings of the 12th International Workshop on Research Issues in Data Engineering: Engineering E-Commerce/E-Business Systems (RIDE'02). San Jose, CA, USA. February 2002. In this paper, the authors consider one category of data mining technique, association rule mining in which they provide two algorithms for rule hiding in the datasets. The two algorithms are: 1.Focusing on hiding the rules by reducing the minimum support of the item sets that generated the rules and 2.Focusing on reducing the minimum confidence of the rules. They also show that deterministic algorithms as discussed in the paper could protect the privacy better.

Supporting References

A. Veloso, Wagner Meira Jr., S. Parthasarathy and M. B. Carvalho. Efficient, Accurate and Privacy-Preserving Data Mining for Frequent Itemsets in Distributed Databases. In Proceedings of the 18th Brazilian Symposium on Databases. Manaus, Amazonas, Brazil. October 2003. pp.281-292.
A. Evfimievski. Randomization in Privacy-Preserving Data Mining. In SIGKDD Explorations, 4(2): 43-48. December 2002.
A. J. Broder. Data Mining, the Internet, and Privacy. In B. M. Masand and M. Spiliopoulou (Eds.): Web Usage Analysis and User Profiling, International WEBKDD'99 Workshop. San Diego, California, USA. August 1999. pp.56-73.
A. Sanil, A. Karr, X. Lin, and J. Reiter. Privacy Preserving Regression Modeling Via Distributed Computation. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004). Seattle, WA, USA. August 2004.
B. Brumen, I. Golob, T. Welzer, I. Rozman, M. Druzovec, and H. Jaakkola. An Algorithm for Protecting Knowledge Discovery Data. In INFORMATICA, 14(3): 277-288. December 2003.
B. Gilburd, A. Schuster, and R. Wolff. A New Privacy Model and Association-Rule Mining Algorithm for Large-Scale Distributed Environments. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004). Seattle, WA, USA. August 2004.
B. Thuraisingham. Data Mining, National Security, Privacy and Civil Liberties. In SIGKDD Explorations, 4(2): 1-5. December 2002.
Bruno Gusmćo Rocha, et al. Disclosing users' data in an environment that preserves privacy. Proceedings of the 2002 ACM workshop on Privacy in the Electronic Society. Washington, DC. 2002. pp. 71 - 80.
C. Boyens, O. Günther and M.Teltzrow. Privacy Conflicts in CRM Services for Online Shops: A Case Study. In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining. Maebashi City, Japan. December 2002. pp.27-35.
C. Clifton and D. Marks. Security and Privacy Implications of Data Mining. In Proceedings of the 1996 ACM SIGMOD Workshop on Data Mining and Knowledge Discovery. Montreal, Canada. June 1996. pp.15-19.
C. Clifton and G. Gengo. Developing Custom Intrusion Detection Filters Using Data Mining. In 2000 Military Communications International Symposium (MILCOM2000), Los Angeles, California. October 2000.
C. Clifton. Protecting Against Data Mining Through Samples. In Proceedings of the 13th Annual IFIP WG 11.3 Working Conference on Database Security. Seattle, WA, USA. July 1999.
C. Clifton. Using Sample Size to Limit Exposure to Data Mining. In Journal of Computer Security, v.8, n.4, IOS Press. November 2000. pp.281-307 (Invited paper).
C. W. Wu. Privacy Preserving Data Mining: A Signal Processing Perspective And A Simple Data Perturbation Protocol. In IEEE ICDM Workshop on Privacy Preserving Data Mining. Melbourne, Florida, USA. November 2003. pp. 10-17.
Chris Clifton, Don Marks. Security and Privacy Implications of Data Mining, with Don Marks, ACM SIGMOD Workshop on Data Mining and Knowledge Discovery. Montreal, Canada. June 2, 1996.
D. Agrawal and C. C. Aggarwal. On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. Santa Barbara, California, USA. May 2001. pp.247-255.
D. Barbara, J. Couto, S. Jajodia, and N. Wu. ADAM: A Testbed for Exploring the Use of Data Mining in Intrusion Detection. In SIGMOD Record, v.30, n.4. December 2001.
D. E. O'Leary. Knowledge Discovery as a Threat to Database Security. In G. Piatetsky-Shapiro and W. J. Frawley (eds.): Knowledge Discovery in Databases. AAAI/MIT Press. Menlo Park, CA. 1991. pp.507-516.
D. Meng and K. Sivakumar. Privacy Sensitive Bayesian Network Parameter Learning. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM 2004). Brighton, UK. November 2004.
David Hand, Heikki Mannila, and Padhraic Smyth. Principles of Data Mining. MIT Press. August 2001.
E. Dasseni, V. S. Verykios, A. K. Elmagarmid, and Elisa Bertino. Hiding Association Rules by Using Confidence and Support. In Proceedings of the 4th International Information Hiding Workshop (IHW). Pittsburg, PA. April 2001. pp.369-383.
G. Piatetsky-Shapiro. Knowledge Discovery in Personal Data vs. Privacy: A mini-symposium. In IEEE Expert, v.10, n.2. April 1995. pp.46-47.
G. Schadow, S. J. Grannis and C. J. McDonald. Privacy-Preserving Distributed Queries for a Clinical Case Research Network. In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining, Maebashi City, Japan, December 2002, pp.55-65.
Gilburd, B. Schuster, A. Wolff, R. Privacy-preserving data mining on data grids in the presence of malicious participants. High performance Distributed Computing. 2004. Proceedings. 13th IEEE International Symposium on. 4-6 June 2004. pp. 225 – 234.
H. Polat and W. Du. Privacy-Preserving Collaborative Filtering Using Randomized Perturbation Techniques. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03). Melbourne, Florida, USA. November 2003. pp. 625-628.
J. B. D. Cabrera, L. Lewis, and R. K. Mehra. Detection and Classification of Intrusions and Faults using Sequences of System Calls. In SIGMOD Record, v.30, n.4. December 2001.
J. Vaidya and C. Clifton. Privacy Preserving Naļve Bayes Classifier for Vertically Partitioned Data. In Proceedings of the 2004 SIAM Conference on Data Mining. Lake Buena Vista, Floria, USA. April 2004.
J. Vaidya and C. Clifton. Privacy-Preserving K-Means Clustering over Vertically Partitioned Data. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA. August 2003. pp.206-215.
J. Vaidya and C. Clifton. Privacy-Preserving Outlier Detection. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM 2004). Brighton, UK. November 2004.
Jaideep Vaidya and Chris Clifton. Privacy preserving association rule mining in vertically partitioned data. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. Edmonton, Alberta, Canada. 2002. pp. 639 – 644.
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann. April 2000.
K. C. Laudon. Markets and privacy. Communications of the ACM, v.39 n.9. September 1996. pp.92-104.
K. Wang, P. Yu, and S. Chakraborty. Botton-Up Generalization: A Data Mining Solution to Privacy Protection. In Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM 2004). Brighton, UK. November 2004.
Kargupta, H. Datta, S. Wang, Q. Krishnamoorthy Sivakumar. On the privacy preserving properties of random data perturbation techniques. Third IEEE International Conference on Data Mining. ICDM 2003. 19-22 Nov. 2003. pp. 99 – 106.
Keith B. Frikken. Mikhail J. Atallah. Privacy preserving electronic surveillance. Proceedings of the 2003 ACM workshop on Privacy in the electronic society. Washington, DC. 2003. pp 45 - 52.
L. Brankovic and V. Estivill-Castro. Privacy Issues in Knowledge Discovery and Data Mining. In Proceedings of the Australian Institute of Computer Ethics Conference(AICEC99) Melbourne, Australia. July 1999. pp.89-99.
L. Mé and C. Michel. Intrusion Detection: A Bibliography. In Technical Report SSIR-001-01. Sup'elec, Rennes, France. September 2001.
M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim and V. Verykios. Disclosure Limitation of Sensitive Rules. In Proceedings of the IEEE Knowledge and Data Engineering Exchange Workshop (KDEX'99). Chicago, IL. November 1999. pp. 45-52.
M. Blum and S. Goldwasser. An efficient probabilistic public-key encryption that hides all partial information. In R. Blakely, editor, Advances in Cryptology - Crypto 84 Proceedings. Springer-Verlag. 1984.
M. Kantarcoglu and J. Vaidya. Privacy Preserving Naive Bayes Classifier for Horizontally Pertitioned Data. In IEEE ICDM Workshop on Privacy Preserving Data Mining. Melbourne, Florida, USA. November 2003. pp. 3-9.
M. Olivier. Database Privacy. In SIGKDD Explorations, 4(2): 20-27. December 2002.
Md. Z. Islan and L. Brankovic. A Framework for Privacy Preserving Data Mining. In Proceedings of the Australasian Workshop on Data Mining and Web Intelligence (DMWI 2004), Dunedin, New Zealand. January 2004. pp. 163-168.
Md. Z. Islan and L. Brankovic. Noise Addition for Protecting Privacy in Data Mining. In Proceedings of the 6th Engineering Mathematics and Applications Conference (EMAC 2003). Sydney, Australia. 2003.
Md. Z. Islan, P. M. Barnaghi, and L. Brankovic. Measuring Data Quality: Predictive Accuracy vs. Similarity of Decision Trees. In Proceedings of the 6th International Conference on Computer and Information Technology (ICCIT 2003). Dhaka, Bangladesh. December 2003.
Moni Naor, Benny Pinkas, Reuben Sumner. Privacy preserving auctions and mechanism design. Proceedings of the 1st ACM conference on Electronic commerce. Denver, Colorado, United States. 1999 pp. 129 – 139.
Murat Kantarcio?glu, Chris Clifton. Assuring privacy when big brother is watching. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery. San Diego, California. 2003. pp. 88 – 93.
Murat Kantarcioglu, Chris Clifton. Privacy-preserving Distributed Mining of Association Rules on Horizontally Partitioned Data, The ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD'2002). Madison, Wisconsin. June 2, 2002.
Murat Kantarcioglu and Jaideep Vaidya. A new architecture for Privacy Preserving Data Mining. In Volume 14 - Privacy, Security and Data Mining of the ACS Series Conferences in Research and Practice in Information Technology.
Nabil R. Adam and John C. Wortmann. Security-control methods for statistical databases: A comparative study. ACM Computing Surveys, 21(4):515-556. December 1989.
O. De Vel, A. Anderson, M. Corney, and G. Mohay. Mining Email Content for Author Identification Forensics. In SIGMOD Record, v.30, n.4. December 2001.
Office of the Information and Privacy Commissioner. Data Mining: Staking a Claim on Your Privacy. Toronto, Ontario. January 1998.
R. Agrawal, A. Arning, T. Bullinger, M. Mehta, J. Shafer, R. Srikant. "The Quest Data Mining System". Proc. of the 2nd Int'l Conference on Knowledge Discovery in Databases and Data Mining. Portland, Oregon. August, 1996.
R. Agrawal, T. Imielinski, A. Swami: Database Mining: A Performance Perspective, IEEE Transactions on Knowledge and Data Engineering, Special issue on Learning and Discovery in Knowledge-Based Databases, Vol. 5, No. 6. December 1993. 914-25.
R. Mukkamala, J. Gagnon, and S. Jajodia. Integrating Data Mining Techniques with Intrusion Detection. In V. Atluri and J. Hale, editors, Research Advances in Database and Information Systems Security, Kluwer Publishers. 2000. pp.33-46.
R. Wright and Z. Yang. Privacy-Preserving Bayesian Network Structure Computation on Distributed Heterogeneous Data. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004). Seattle, WA, USA. August 2004.
Rakesh Agrawal, Alexandre Evfimievski, and Ramakrishnan Srikant. Information sharing across private databases. In Proceedings of ACM SIGMOD Inter-national Conference on Management of Data. San Diego, CA. June 9-12 2003.
Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules. In Proceedings of the Twentieth International Conference on Very Large Data Bases. Santiago, Chile. September 12-15 1994. pp 487-499. VLDB.
Robert M. Arlein, et al. Privacy-preserving global customization. Proceedings of the 2nd ACM conference on Electronic commerce. Minneapolis, Minnesota, United States. 2000. pp. 176 - 184
S. J. Rizvi and J. R. Haritsa. Privacy-Preserving Association Rule Mining. In Proceedings of 28th International Conference on Very Large Data Bases. VLDB. Hong Kong, China. August 2002.
S. J. Stolfo, W. Lee, P. K. Chan, W. Fan, and E. Eskin. Data Mining-based Intrusion Detectors: An Overview of the Columbia IDS Project. In SIGMOD Record, v.30, n.4. December 2001. pp.45-54.
S. Merugu and J. Ghosh. Privacy-Preserving Distributed Clustering Using Generative Models. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03). Melbourne, Florida, USA. November 2003. pp. 211-218.
S. R. M. Oliveira and O. R. Zaļane. Achieving Privacy Preservation When Sharing Data For Clustering. In Proceedings of the International Workshop on Secure Data Management in a Connected World (SDM'04) in conjunction with VLDB 2004.Toronto, Canada. August, 2004.
S. R. M. Oliveira and O. R. Zaļane. Algorithms for Balancing Privacy and Knowledge Discovery in Association Rule Mining. In Proceedings of the 7th International Database Engineering and Applications Symposium (IDEAS 2003). Hong Kong, China. July 2003. pp.54-63.
S. R. M. Oliveira and O. R. Zaļane. Foundations for an Access Control Model for Privacy Preservation in Multi-Relational Association Rule Mining. In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining. Maebashi City, Japan. December 2002. pp.19-26.
S. R. M. Oliveira and O. R. Zaļane. Privacy Preserving Clustering By Data Transformation. In Proceedings of the 18th Brazilian Symposium on Databases. Manaus, Amazonas, Brazil. October 2003. pp.304-318.
S. R. M. Oliveira and O. R. Zaļane. Protecting Sensitive Knowledge By Data Sanitization. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03). Melbourne, Florida, USA. November 2003. pp. 613-616.
S. R. M. Oliveira and O. R. Zaļane. Privacy Preserving Frequent Itemset Mining. In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining. Maebashi City, Japan. December 2002. pp.43-54.
T. Johnsten and V. V. Raghavan. A Methodology for Hiding Knowledge in Databases. In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining. Maebashi City, Japan. December 2002. pp.9-17.
T. Johnsten, andV. V. Raghavan. Impact of Decision-Region Based Classification Mining Algorithms on Database Security. In Proceedings of the 13th Annual IFIP WG 11.3 Working Conference on Database Security. Seattle, WA, USA. July 1999. pp.177-191.
T. Johnsten, andV. V. Raghavan. Security Procedures for Classification Mining Algorithms. In Proceedings of the 15th Annual IFIP WG 11.3 Working Conference on Database and Applications Security. Niagara on the Lake, ON, Canada. July 2001. pp.293-309.
T. Mielikainen. On inverse Frequent Set Mining. In IEEE ICDM Workshop on Privacy Preserving Data Mining. Melbourne, Florida, USA. November 2003. pp. 18-23.
V. Estivill-Castro and L. Brankovic. Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules. In Proceedings of the First International Data Warehousing and Knowledge Discovery (DaWaK’99:). Florence, Italy. August 30 - September 1999. pp.389-398.
Privacy Law and Policy Reporter, v.6, n.3. September 1999. pp.33-35.
V. S. Iyengar. Transforming Data to Satisfy Privacy Constraints. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, AB, Canada. July 2002. pp.279-288.
W. Du and M. J. Atallah. Protocols for secure remote database access with approximate matching. In Proc. of the First Workshop on Security and Privacy in E Commerce. Nov. 2000.
W. Du and M. J. Atallah. Privacy-Preserving Cooperative Scientific Computations. In Proceedings of the 14th IEEE Computer Security Foundations Workshop (CSFW'01). Cape Breton, Novia Scotia, Canada. June 2001. pp.273-.285.
W. Du and Z. Zhan. Building Decision Tree Classifier on Private Data. In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining, Maebashi City, Japan. December 2002. pp.1-8.
W. Du and Z. Zhan. Using Randomized Response Techniques for Privacy-Preserving Data Mining. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA. August 2003. pp.505-510.
W. Du, Y. S. Han, and S. Chen. Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification. In Proceedings of the 2004 SIAM Conference on Data Mining. Lake Buena Vista, Florida, USA. April 2004.
W. Fan, W. Lee, S. J. Stolfo, and M. Miller. A Multiple Model Cost-Sensitive Approach for Intrusion Detection. In Proceedings of the 11th European Conference on Machine Learning (ECML00), Barcelona Spain. May 2000. pp.148-156.
W. Lee and D. Xiang. Information-Theoretic Measures for Anomaly Detection. In Proceedings of the IEEE Symposium on Security and Privacy. Oakland, CA, USA. May 2001. pp.130-143.
W. Lee and S. J. Stolfo. A Framework for Constructing Features and Models for Intrusion Detection Systems. In ACM Transactions on Information and System Security, v.3, n.4. November 2000. pp.227-261.
W. Lee and S. Stolfo. Data Mining Approaches for Intrusion Detection. In Proceedings of the 7th USENIX Security Symposium. January 1998. pp 79-93.
W. Lee and W. Fan. Mining System Audit Data: Opportunities and Challenges. In SIGMOD Record, v.30, n.4. December 2001.
W. Lee, S. J. Stolfo, and K. Mok. Adaptive Intrusion Detection: A data Mining Approach. Artificial Intelligence Review, v.14, 2001. pp.533-567.
W. Lee, S. J. Stolfo, P. K. Chan, E. Eskin, W. Fan, M. Miller, S. Hershkop, and J. Zhang. Real Time Data Mining-based Intrusion Detection. In Proceedings of DARPA Information Survivability Conference and Exposition (DISCEX-II 2001), Anaheim, CA, USA. June 2001. pp.85-100.
W. Lee, W. Fan, M. Miller, S. J. Stolfo, and E. Zadok. Toward Cost-Sensitive Modeling for Intrusion Detection and Response. In Proceedings of the 1st ACM Workshop on Intrusion Detection Systems, 2000. (Also available as Technical Report CUCS-002-00, Computer Science, Columbia University. 2000.)
W. Lee. Applying Data Mining to Intrusion Detection: The Quest for Automation, Efficiency, and Credibility. In SIGKDD Explorations, 4(2): 35-42. December 2002.
Wenliang Du. A Study of Several Specific Secure Two-party Computation Problems. PhD thesis, Purdue University, West Lafayette, Indiana. 2001.
Wenliang Du and Mikhail J. Atallah. Privacy-preserving statistical analysis. In Proceedings of the Seventeenth Annual Computer Security Applications Conference. New Orleans, LA. December 10-14 2001.
Willi Klösgen. Anonymization Techniques for Knowledge Discovery in Databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95). Montreal, Canada. August 1995. AAAI Press, ISBN 0-929280-82-2. pp.186-191.
Willi Klösgen. KDD: Public and Private Concerns. In IEEE Expert, v.10, n.2. April 1995. pp.55-57.
Di, H. Liu, A. Ramineni, and A. Sen. Detecting Hidden Information in Images: A Comparative Study. In IEEE ICDM Workshop on Privacy Preserving Data Mining. Melbourne, Florida, USA. November 2003. pp. 24-30.
Y. Saygin, V.S. Verykios and C. Clifton. Using Unknowns to Prevent Discovery of Association Rules. In SIGMOD Record, v.30, n.4. December 2001.
Y. Zhu and L. Liu. Optimal Randomization for Privacy Preserving Data Mining. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004). Seattle, WA, USA. August 2004.

References to Privacy-Preserving Data Mining Literature

Privacy-Preserving Data Mining

Key References

Supporting References

Related Links