We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
An improved SMOTE based on center offset factor and synthesis strategy for imbalanced data classification.
- Authors
Zhang, Ying; Deng, Li; Huang, Hefeng; Wei, Bo
- Abstract
It is an enormous challenge for imbalanced data learning in the field of machine learning. To construct balanced datasets, oversampling techniques have been studied extensively. However, many oversampling methods suffer from introducing noisy samples and blurring classification boundaries, leading to overfitting. To solve this problem, this paper proposes a new oversampling method, namely CS-SMOTE, for synthesizing minority class samples by three-point interpolation. CS-SMOTE is mainly based on the center offset factor and a synthesis strategy. First, the CS-SMOTE method removes noise samples, calculates the center offset factor, and selects sparsely distributed minority class samples by using the K-distance graph technique. Next, new samples are generated based on sparse minority samples, random minority samples, and sub-cluster centers located in the same sub-cluster samples. Finally, multiple comparative experiments on 18 well-known datasets demonstrate the effectiveness and general applicability of the proposed CS-SMOTE method for the imbalanced data classification. The experiments show that CS-SMOTE outperforms other competitors in terms of classification accuracy, while avoiding the issue of overfitting.
- Subjects
MACHINE learning; PROBLEM solving; INTERPOLATION; STATISTICAL sampling; CLASSIFICATION
- Publication
Journal of Supercomputing, 2024, Vol 80, Issue 15, p22479
- ISSN
0920-8542
- Publication type
Article
- DOI
10.1007/s11227-024-06287-3