Learning Robust Association Rules from Data
Example of an Association Rule
"People who buy diapers tend to buy beer." [Grocery data]
Examples of Robust Association Rules
"People who buy baby products tend to buy controlled substances, use credit cards and make purchases on Saturdays." [Grocery data]
"Democrats born in 1960's and registered in the 1980's, tend to be female." [Cambridge, MA Voter List]
"Republican Whites owning home tend to be females with no children."
[Pittsburgh, PA Voter List for ZIP 15213]
Learning semantically useful association rules across all attributes of a relational table requires:
- more rigorous learning than afforded by traditional approaches.
- the invention of knowledge ratings for learned rules, not just statistical
See GenTree with knowledge ratings
Traditional algorithms began by learning rules over one attribute expressed in the domain values of that attribute (Srikant and Agrawal, 1995).
People who buy diapers tend to buy beer is an example from grocery purchases. In the second generation, Srikant and Agrawal, (1996) introduced a hierarchy whose base values are those originally represented in the data, and values appearing at higher levels in the hierarchy represent increasingly more general concepts of base values. Rules learned over the attribute using the hierarchy are termed generalized association rules (or cross-level rules).
People who buy baby products tend to buy controlled substances is an example of a generalized association rule. The work reported herein continues the evolution in the expressiveness of association rules to its broadest application – semantically rated rules learned from a large relational table having many attributes. Associated with each attribute is a hierarchy. We find the rules that are formed by combining mixed levels of generalizations across all attributes and that convey the maximum expression of information supported by attribute hierarchies, parameter settings and data tuples. We term these robust rules. An example of a robust rule is
People who buy baby products tend to buy controlled substances, use credit cards and make purchases on Saturdays.
Keywords: association rules, classification problems, data mining, knowledge acquisition, rule learning, hiearchical learning
- Y. Li and L. Sweeney. Adding Semantics and Rigor to Association Rule Learning: the GenTree Approach, Carnegie Mellon University, School of Computer Science, Tech Report, CMU ISRI 05-101. Pittsburgh: January 2005.
Full paper PDF.
- Y. Li and L. Sweeney. Learning Robust Rules from Data, Carnegie Mellon University, School of Computer Science, Tech Report, CMU ISRI 04-107, CMU-CALD-04-100. Pittsburgh: February 2004.
Abstract, Paper: 21 pages
- L. Sweeney, Achieving k-anonymity privacy protection using generalization and suppression.
International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10 (5), 2002; 571-588.
Paper: 18 pages in PS or PDF.
Copyright © 2011. President and Fellows Harvard University. |
Data Privacy Lab |