We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Fast mining of distance-based outliers in high-dimensional datasets.
- Authors
Ghoting, Amol; Parthasarathy, Srinivasan; Otey, Matthew Eric
- Abstract
Defining outliers by their distance to neighboring data points has been shown to be an effective non-parametric approach to outlier detection. In recent years, many research efforts have looked at developing fast distance-based outlier detection algorithms. Several of the existing distance-based outlier detection algorithms report log-linear time performance as a function of the number of data points on many real low-dimensional datasets. However, these algorithms are unable to deliver the same level of performance on high-dimensional datasets, since their scaling behavior is exponential in the number of dimensions. In this paper, we present RBRP, a fast algorithm for mining distance-based outliers, particularly targeted at high-dimensional datasets. RBRP scales log-linearly as a function of the number of data points and linearly as a function of the number of dimensions. Our empirical evaluation demonstrates that we outperform the state-of-the-art algorithm, often by an order of magnitude.
- Subjects
ALGORITHMS; FOUNDATIONS of arithmetic; COMPUTER programming; POLAR forms (Mathematics); EUCLIDEAN algorithm; NUMBER theory; ALGEBRAIC number theory; ARITHMETIC functions; COMPLEX variables
- Publication
Data Mining & Knowledge Discovery, 2008, Vol 16, Issue 3, p349
- ISSN
1384-5810
- Publication type
Article
- DOI
10.1007/s10618-008-0093-2