We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
A cluster-based SMOTE both-sampling (CSBBoost) ensemble algorithm for classifying imbalanced data.
- Authors
Salehi, Amir Reza; Khedmati, Majid
- Abstract
In this paper, a Cluster-based Synthetic minority oversampling technique (SMOTE) Both-sampling (CSBBoost) ensemble algorithm is proposed for classifying imbalanced data. In this algorithm, a combination of over-sampling, under-sampling, and different ensemble algorithms, including Extreme Gradient Boosting (XGBoost), random forest, and bagging, is employed in order to achieve a balanced dataset and address the issues including redundancy of data after over-sampling, information loss in under-sampling, and random sample selection for sampling and sample generation. The performance of the proposed algorithm is evaluated and compared to different state-of-the-art competing algorithms based on 20 benchmark imbalanced datasets in terms of the harmonic mean of precision and recall (F1) and area under the receiver operating characteristics curve (AUC) measures. Based on the results, the proposed CSBBoost algorithm performs significantly better than the competing algorithms. In addition, a real-world dataset is used to demonstrate the applicability of the proposed algorithm.
- Subjects
RECEIVER operating characteristic curves; RANDOM forest algorithms; ALGORITHMS
- Publication
Scientific Reports, 2024, Vol 14, Issue 1, p1
- ISSN
2045-2322
- Publication type
Article
- DOI
10.1038/s41598-024-55598-1