Realistic Data for Testing Rule Mining Algorithms

Realistic Data for Testing Rule Mining Algorithms

Author: 
Cooper, Colin
Place: 
Hershey
Publisher: 
IGI Global
Date published: 
2008
Responsibility: 
Zito, Michele, jt.author
Editor: 
Wang, John
Journal Title: 
Encyclopedia of Data Warehousing and Mining, Second Edition
Source: 
Encyclopedia of Data Warehousing and Mining, Second Edition
Abstract: 

The association rule mining (ARM) problem is a wellestablished topic in the field of knowledge discovery in databases. The problem addressed by ARM is to identify a set of relations (associations) in a binary valued attribute set which describe the likely coexistence of groups of attributes. To this end it is first necessary to identify sets of items that occur frequently, i.e. those subsets F of the available set of attributes I for which the support (the number of times F occurs in the dataset under consideration), exceeds some threshold value. Other criteria are then applied to these item-sets to generate a set of association rules, i.e. relations of the form A ? B, where A and B represent disjoint subsets of a frequent item-set F such that A ? B = F. A vast array of algorithms and techniques has been developed to solve the ARM problem. The algorithms of Agrawal & Srikant (1994), Bajardo (1998), Brin, et al. (1997), Han et al. (2000), and Toivonen (1996), are only some of the best-known heuristics. There has been recent growing interest in the class of so-called heavy tail statistical distributions. Distributions of this kind had been used in the past to describe word frequencies in text (Zipf, 1949), the distribution of animal species (Yule, 1925), of income (Mandelbrot, 1960), scientific citations count (Redner, 1998) and many other phenomena. They have been used recently to model various statistics of the web and other complex networks Science (Barabasi & Albert, 1999; Faloutsos et al., 1999; Steyvers & Tenenbaum, 2005).

CITATION: Cooper, Colin. Realistic Data for Testing Rule Mining Algorithms edited by Wang, John . Hershey : IGI Global , 2008. Encyclopedia of Data Warehousing and Mining, Second Edition - Available at: https://library.au.int/realistic-data-testing-rule-mining-algorithms