References to Generalized Association Rule Mining Literature

Generalized Association Rule Mining


Association rule mining has been the most popular form of data mining and there has been a lot of work in the area. But this page only surveys papers related to learning generalized association rules, especially with the use of a hierarchy (a more-general-than relation over the values an attribute can assume). Below is a list of publications (in chronological order) found in the computer science literature. (If you have an additional citation you deem essential to this collection, please let us know.)


  1. R. Srikant and R. Agrawal. Mining Generalized Association Rules. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995. One of the first to introduce hierarchy/taxonomy into association rule learning. It focused on databases of transactions and is-a hierarchies, formally defined the concept of generalized association rule and designed and compared couple of rule mining algorithms based on frequent itemsets and apriori candidate generations.

  2. J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proc. of the 21st Int'l Conference on Very Large Databases, Zurich, Switzerland, September 1995. The other one of the first in the transactional database domain to incorporate the concept of hierarchy/taxonomy into association rule mining. It detailed a progressive deepening rule mining algorithm and variants based on frequent itemset and apriori strategies, which was aimed at finding multiple-level rules at same levels. The paper also had discussions on considering mixed hierarchies and learning rules involving different level concepts in a hierarchy.

  3. R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, June 1996. Introduced the idea of quantitative association rules, an extension of generalized association rules considering both quantitative and categorical attibutes, in the domain of large relational tables. The paper emphasized on how to partition a quantitative attribute according to a data-dependant information loss measure -- partial completeness, and gave a rule mining algorithm which would combine adjacent partitions as necessary.

  4. D. Gunopulos, H. Mannila, and S. Saluja. Discovering all the most specific sentences by randomized algorithms. In Intl. Conf. on Database Theory, January 1997. Introduces an approach to find all maximal frequent itemsets, which were frequent itemsets of maximal size, i.e., extending them into any supersets would result in non-frequent itemsets. Given a complete set of maximal frequent itemsets, data miners could generate all other frequent itemsets by taking non-empty subsets of them. As compared with traditional approaches to find all frequent item, finding maximal frequent itemsets were proved to be computationally much cheaper.

  5. S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to correlations. SIGMOD Record (ACM Special Interest Group on Management of Data), 26(2):265, 1997. Another direction of generalizing association rules. This paper pointed out association rules were estimating conditional probabilities of positive events, while the underlying correlation might be either positive or negative. By generalizing association rules to correlations, both positive and negative relations would be accounted for. Chi-squared test on contingency tables was proposed as a measure of correlation.

  6. R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc. 3rd Int. Conf. Knowledge Discovery and Data Mining, pages 67--73, 1997. Provided mechanisms for users to specify constraints when mining association rules. Boolean expressions over the presence or absence of items, as well as their descendants and/or antecedents if hierarchies are considered, could be inputted into the rule mining algorithms presented in this paper, thus execution time might be largely reduced as compared to mining complete set of rules and applying post-processing filtering.

  7. D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov. Query flocks: A generalization of association-rule mining. In Proc. 1998 ACM-SIGMOD, pp 1--12. This paper was written from the database management perspective and presented a notation "query flock", which consisted of a parameterized query and a filter that selects values for the parameters by applying a condition to the query results. Apriori technique developped from association rule mining was generalized to apply to such query flocks.

  8. J. Hipp, A. Myka, R. Wirth, and U. Guntzer. A new algorithm for faster mining of generalized association rules. In Proc. 2nd PKKD, 1998. Presented an algorithm called Prutax, which combined several previous frequent itemset mining optimizations, as a way to discover generalized frequent itemsets faster. It was still locked in the traditional framework of "finding frequent itemset first". However, it did not take into consideration rules that could learn in depth in hierarchies, and further redundancy issues related to such rules.

  9. D. Lin and Z. Kedem. Pincer-search: A new algorithm for discovering the maximum frequent set. In 6th Intl. Conf. Extending Database Technology, March 1998. Introduces an approach to find all maximal frequent itemsets, which were frequent itemsets of maximal size, i.e., extending them into any supersets would result in non-frequent itemsets. Given a complete set of maximal frequent itemsets, data miners could generate all other frequent itemsets by taking non-empty subsets of them. As compared with traditional approaches to find all frequent item, finding maximal frequent itemsets were proved to be computationally much cheaper.

  10. R. Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD Conf. Management of Data, June 1998. Introduces an approach to find all maximal frequent itemsets, which were frequent itemsets of maximal size, i.e., extending them into any supersets would result in non-frequent itemsets. Given a complete set of maximal frequent itemsets, data miners could generate all other frequent itemsets by taking non-empty subsets of them. As compared with traditional approaches to find all frequent item, finding maximal frequent itemsets were proved to be computationally much cheaper.

  11. S. Thomas and S.a Sarawagi. Mining generalized association rules and sequential patterns using sql queries. In Proceedings of the 4 th Intl. Conf. on Knowledge Discovery and Data Mining (KDD'98), pages 344--348, New York City, New York, USA, August 1998. From the database management perspective, this paper provided recipes of mining generalized association rules and sequential patterns using SQL queries. It showed that it was possible to express mining computations significantly more complex than simple boolean associations in SQL, using the same framework.

  12. M. Zaki. Generating non-redundant association rules. In 6th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, August 2000. Discusses the redundancy of traditional frequent itemsets based association rule generation framework, and introduced the concept of closed frequent itemset and corresponding mining algorithms. A framework based on closed frequent itemsets could cut redundancy caused by traditional means dramatically, and without loss of information (whereas maximal frequent itemsets would). This presents the case where attributes are treated as categorical only, i.e., with 2-level hierarchies.

  13. M. Zaki and C. Hsiao. CHARM: An Efficient Algorithm for Closed Itemset Mining, 2nd SIAM International Conference on Data Mining, Arlington, April 2002. Discusses the redundancy of traditional frequent itemsets based association rule generation framework, and introduced the concept of closed frequent itemset and corresponding mining algorithms. A framework based on closed frequent itemsets could cut redundancy caused by traditional means dramatically, and without loss of information (whereas maximal frequent itemsets would). This presents the case where attributes are treated as categorical only, i.e., with 2-level hierarchies.

  14. C. Perng, H. Wang, S. Ma and J. Hellerstein. Discovery in Multi-attribute Data with User-defined Constraints, ACM SIGKDD Explorations Newsletter, Volume4, Issue 1, Pages: 56 - 64, June 2002. Presents the idea of learning frequent itemsets off transactional data rendered in various ways from original relational data: partitioning attributes into grouping and itemizing categories disjointly, and mining frequent itemsets based on these different settings. A lattice captures the partial ordering of these partitioning settings and thus improve efficiency. User-defined inner/inter-attribute constraints can be described by a language provided.


Related Links

This list was initially compiled by Yiheng Li and Latanya Sweeney. For additions or changes, please contact us.


Copyright © 2011. President and Fellows Harvard University.   |   IQSS   |    Data Privacy Lab   |    [info@dataprivacylab.org]