Böhm, Christian; Krebs, Florian

doi:10.1007/s10115-003-0122-9

Back to matches

Your institution may have rights to this item. Sign in to continue.

Title: Thek-Nearest Neighbour Join: Turbo Charging the KDD Process.
Authors: Böhm, Christian; Krebs, Florian
Abstract: The similarity join has become an important database primitive for supporting similarity searches and data mining. A similarity join combines two sets of complex objects such that the result contains all pairs of similar objects. Two types of the similarity join are well-known, the distance range join, in which the user defines a distance threshold for the join, and the closest pair query ork-distance join, which retrieves thekmost similar pairs. In this paper, we propose an important, third similarity join operation called thek-nearest neighbour join, which combines each point of one point set with itsknearest neighbours in the other set. We discover that many standard algorithms of Knowledge Discovery in Databases (KDD) such ask-means andk-medoid clustering, nearest neighbour classification, data cleansing, postprocessing of sampling-based data mining, etc. can be implemented on top of thek-nn join operation to achieve performance improvements without affecting the quality of the result of these algorithms. We propose a new algorithm to compute thek-nearest neighbour join using the multipage index (MuX), a specialised index structure for the similarity join. To reduce both CPU and I/O costs, we develop optimal loading and processing strategies.
Subjects: DATA mining; DATABASE searching; SPATIAL analysis (Statistics); DATABASES; INDEXING
Publication: Knowledge & Information Systems, 2004, Vol 6, Issue 6, p728
ISSN: 0219-1377
Publication type: Article
DOI: 10.1007/s10115-003-0122-9

We found a match

Thek-Nearest Neighbour Join: Turbo Charging the KDD Process.

Böhm, Christian; Krebs, Florian

DATA mining; DATABASE searching; SPATIAL analysis (Statistics); DATABASES; INDEXING

Knowledge & Information Systems, 2004, Vol 6, Issue 6, p728

0219-1377

Article

10.1007/s10115-003-0122-9