We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Partitioning strategies for distributed association rule mining.
- Authors
FRANS COENEN; PAUL LENG
- Abstract
In this paper a number of alternative strategies for distributed/parallel association rule mining are investigated. The methods examined make use of a data structure, the T-tree, introduced previously by the authors as a structure for organizing sets of attributes for which support is being counted. We consider six different approaches, representing different ways of parallelizing the basic Apriori-T algorithm that we use. The methods focus on different mechanisms for partitioning the data between processes, and for reducing the message-passing overhead. Both ‘horizontal’ (data distribution) and ‘vertical’ (candidate distribution) partitioning strategies are considered, including a vertical partitioning algorithm (DATA-VP) which we have developed to exploit the structure of the T-tree. We present experimental results examining the performance of the methods in implementations using JavaSpaces. We conclude that in a JavaSpaces environment, candidate distribution strategies offer better performance than those that distribute the original dataset, because of the lower messaging overhead, and the DATA-VP algorithm produced results that are especially encouraging.
- Subjects
DATA mining; DATABASE searching; DECISION support systems; KNOWLEDGE management; ALGORITHMS; JAVA programming language
- Publication
Knowledge Engineering Review, 2006, Vol 21, Issue 1, p25
- ISSN
0269-8889
- Publication type
Article
- DOI
10.1017/s0269888906000786