We found a match
Your institution may have rights to this item. Sign in to continue.
- Title
Thek-Nearest Neighbour Join: Turbo Charging the KDD Process.
- Authors
Böhm, Christian; Krebs, Florian
- Abstract
The similarity join has become an important database primitive for supporting similarity searches and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Two types of the similarity join are well-known, the distance range join, in which the user defines a distance threshold for the join, and the closest pair query ork-distance join, which retrieves thekmost similar pairs. In this paper, we propose an important, third similarity join operation called thek-nearest neighbour join, which combines each point of one point set with itsknearest neighbours in the other set. We discover that many standard algorithms of Knowledge Discovery in Databases (KDD) such ask-means andk-medoid clustering, nearest neighbour classification, data cleansing, postprocessing of sampling-based data mining, etc. can be implemented on top of thek-nn join operation to achieve performance improvements without affecting the quality of the result of these algorithms. We propose a new algorithm to compute thek-nearest neighbour join using the multipage index (MuX), a specialised index structure for the similarity join. To reduce both CPU and I/O costs, we develop optimal loading and processing strategies.
- Subjects
DATA mining; DATABASE searching; SPATIAL analysis (Statistics); DATABASES; INDEXING
- Publication
Knowledge & Information Systems, 2004, Vol 6, Issue 6, p728
- ISSN
0219-1377
- Publication type
Article
- DOI
10.1007/s10115-003-0122-9