We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Effective Resampling Approach for Skewed Distribution on Imbalanced Data Set.
- Authors
Mar Mar Nwe; Khin Thidar Lynn
- Abstract
Accurate classification of unknown input data for imbalanced data sets is difficult, because the predictions of learning classifiers tend to be biased towards the majority class and ignore the minority class. Moreover, the class distribution of imbalanced data has a significant impact on the misclassification rate of the learning classifier. So, this paper introduces an effective data pre-processing approach to improve the efficiency of imbalanced data classification, focusing on the skewed distribution of data points in the imbalanced data set. This proposed approach involves over-sampling and under-sampling techniques based on k-means clustering to overcome the problems associated with imbalanced learning of small disjuncts and small sample size. And, Tomek Link-based under-sampling method is also incorporated into the proposed cluster-based resampling methods to solve the class overlapping problem by eliminating the majority samples in overlapping regions. Experiments are performed on the 25 standard imbalanced data sets by applying four learning classifiers, and validated with the three popular metrics (i.e. Area Under the Curve (AUC), Geometric Mean (G-mean) and Balanced Accuracy (BA)). Specially, we show that the proposed approach has outperformed the other state-of-the-art resampling methods using performance metrics, probabilistic estimation, statistical analysis and multicriteria decision-making methodology (MCDM).
- Subjects
SKEWNESS (Probability theory); DATA distribution; K-means clustering; STATISTICS; FORECASTING; STATISTICAL bootstrapping; PROBABILISTIC number theory
- Publication
IAENG International Journal of Computer Science, 2020, Vol 47, Issue 2, p149
- ISSN
1819-656X
- Publication type
Article