Association rule mining has been the most popular form of data mining
and there has been a lot of work in the area. But this page only surveys
papers related to learning generalized association rules,
especially with the use of a hierarchy (a more-general-than relation
over the values an attribute can assume).
Below is a list of publications (in chronological order) found in the computer science literature.
(If you have an additional citation you deem essential to this collection, please let
us know.)
- R. Srikant and R. Agrawal. Mining Generalized Association
Rules. In Proc. of the 21st Int'l Conference on Very Large Databases,
Zurich, Switzerland, September 1995.
One of the first to introduce hierarchy/taxonomy into association rule
learning. It focused on databases of transactions and is-a hierarchies,
formally defined the concept of generalized association rule and designed
and compared couple of rule mining algorithms based on frequent itemsets and
apriori candidate generations.
- J. Han and Y. Fu. Discovery of multiple-level association rules from
large databases. In Proc. of the 21st Int'l Conference on Very Large
Databases, Zurich, Switzerland, September 1995.
The other one of the first in the transactional database domain to
incorporate the concept of hierarchy/taxonomy into association rule mining.
It detailed a progressive deepening rule mining algorithm and variants based
on frequent itemset and apriori strategies, which was aimed at finding
multiple-level rules at same levels. The paper also had discussions on
considering mixed hierarchies and learning rules involving different level
concepts in a hierarchy.
- R. Srikant and R. Agrawal. Mining quantitative association rules in
large relational tables. In Proceedings of the ACM SIGMOD Conference on
Management of Data, Montreal, Canada, June 1996.
Introduced the idea of quantitative association rules, an extension of
generalized association rules considering both quantitative and categorical
attibutes, in the domain of large relational tables. The paper emphasized
on how to partition a quantitative attribute according to a data-dependant
information loss measure -- partial completeness, and gave a rule mining
algorithm which would combine adjacent partitions as necessary.
- D. Gunopulos, H. Mannila, and S. Saluja. Discovering all the most
specific sentences by randomized algorithms. In Intl. Conf. on Database
Theory, January 1997.
Introduces an approach to find all maximal frequent
itemsets, which were frequent itemsets of maximal size, i.e., extending them
into any supersets would result in non-frequent itemsets. Given a complete
set of maximal frequent itemsets, data miners could generate all other
frequent itemsets by taking non-empty subsets of them. As compared with
traditional approaches to find all frequent item, finding maximal frequent
itemsets were proved to be computationally much cheaper.
- S. Brin, R. Motwani, and C. Silverstein. Beyond market
baskets: Generalizing association rules to correlations. SIGMOD Record (ACM
Special Interest Group on Management of Data), 26(2):265, 1997.
Another direction of generalizing association rules. This paper pointed
out association rules were estimating conditional probabilities of positive
events, while the underlying correlation might be either positive or
negative. By generalizing association rules to correlations, both positive
and negative relations would be accounted for. Chi-squared test on
contingency tables was proposed as a measure of correlation.
- R. Srikant, Q. Vu, and R. Agrawal. Mining association
rules with item constraints. In Proc. 3rd Int. Conf. Knowledge Discovery and
Data Mining, pages 67--73, 1997.
Provided mechanisms for users to specify constraints when mining
association rules. Boolean expressions over the presence or absence of
items, as well as their descendants and/or antecedents if hierarchies are
considered, could be inputted into the rule mining algorithms presented in
this paper, thus execution time might be largely reduced as compared to
mining complete set of rules and applying post-processing filtering.
- D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S.
Nestorov. Query flocks: A generalization of association-rule mining. In
Proc. 1998 ACM-SIGMOD, pp 1--12.
This paper was written from the database management perspective and
presented a notation "query flock", which consisted of a parameterized query
and a filter that selects values for the parameters by applying a condition
to the query results. Apriori technique developped from association rule
mining was generalized to apply to such query flocks.
- J. Hipp, A. Myka, R. Wirth, and U. Guntzer. A new algorithm for faster
mining of generalized association rules. In Proc. 2nd PKKD, 1998.
Presented an algorithm called Prutax, which combined several previous
frequent itemset mining optimizations, as a way to discover generalized
frequent itemsets faster. It was still locked in the traditional framework
of "finding frequent itemset first". However, it did not take into
consideration rules that could learn in depth in hierarchies, and further
redundancy issues related to such rules.
- D. Lin and Z. Kedem. Pincer-search: A new algorithm for discovering
the maximum frequent set. In 6th Intl. Conf. Extending Database Technology,
March 1998.
Introduces an approach to find all maximal frequent
itemsets, which were frequent itemsets of maximal size, i.e., extending them
into any supersets would result in non-frequent itemsets. Given a complete
set of maximal frequent itemsets, data miners could generate all other
frequent itemsets by taking non-empty subsets of them. As compared with
traditional approaches to find all frequent item, finding maximal frequent
itemsets were proved to be computationally much cheaper.
- R. Bayardo. Efficiently mining long patterns from databases. In ACM
SIGMOD Conf. Management of Data, June 1998.
Introduces an approach to find all maximal frequent
itemsets, which were frequent itemsets of maximal size, i.e., extending them
into any supersets would result in non-frequent itemsets. Given a complete
set of maximal frequent itemsets, data miners could generate all other
frequent itemsets by taking non-empty subsets of them. As compared with
traditional approaches to find all frequent item, finding maximal frequent
itemsets were proved to be computationally much cheaper.
- S. Thomas and S.a Sarawagi. Mining generalized association rules
and sequential patterns using sql queries. In Proceedings of the 4 th Intl.
Conf. on Knowledge Discovery and Data Mining (KDD'98), pages 344--348, New
York City, New York, USA, August 1998.
From the database management perspective, this paper provided recipes of
mining generalized association rules and sequential patterns using SQL
queries. It showed that it was possible to express mining computations
significantly more complex than simple boolean associations in SQL, using
the same framework.
- M. Zaki. Generating non-redundant association rules. In 6th ACM SIGKDD
Int'l Conf. Knowledge Discovery and Data Mining, August 2000.
Discusses the redundancy of traditional frequent
itemsets based association rule generation framework, and introduced the
concept of closed frequent itemset and corresponding mining algorithms. A
framework based on closed frequent itemsets could cut redundancy caused by
traditional means dramatically, and without loss of information (whereas
maximal frequent itemsets would). This presents the case
where attributes are treated as categorical only, i.e., with 2-level hierarchies.
- M. Zaki and C. Hsiao. CHARM: An Efficient Algorithm for Closed
Itemset Mining, 2nd SIAM International Conference on Data Mining, Arlington,
April 2002.
Discusses the redundancy of traditional frequent
itemsets based association rule generation framework, and introduced the
concept of closed frequent itemset and corresponding mining algorithms. A
framework based on closed frequent itemsets could cut redundancy caused by
traditional means dramatically, and without loss of information (whereas
maximal frequent itemsets would). This presents the case
where attributes are treated as categorical only, i.e., with 2-level hierarchies.
- C. Perng, H. Wang, S. Ma and J. Hellerstein. Discovery in Multi-attribute Data with User-defined Constraints, ACM SIGKDD Explorations Newsletter, Volume4, Issue 1, Pages: 56 - 64,
June 2002.
Presents the idea of learning frequent itemsets off transactional data rendered in various
ways from original relational data: partitioning attributes into grouping and
itemizing categories disjointly, and mining frequent itemsets based on these
different settings. A lattice captures the partial ordering of these partitioning settings
and thus improve efficiency. User-defined inner/inter-attribute constraints can be described
by a language provided.
Related Links