We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Binary imbalanced big data classification based on fuzzy data reduction and classifier fusion.
- Authors
Zhai, Junhai; Wang, Mohan; Zhang, Sufang
- Abstract
The era of big data has arrived, making it impossible for traditional machine learning algorithms to perform training in a stand-alone computing environment. In this paper, we propose a method for imbalanced binary classification of large-scale datasets based on undersampling and ensemble. More specifically, our method first adaptively partitions the majority class big data into k clusters, followed by undersampling to create k balanced datasets. Subsequently, k base classifiers are trained on each balanced dataset and are combined to perform the final prediction. Existing undersampling methods randomly select a subset of the majority class; thus, important instances may be lost during the process. In contrast, our proposed fuzzy data reduction scheme selects informative instances from each cluster, preventing information loss. Traditional ensemble methods have negative correlations between the base classifiers, whereas our proposed classifier fusion scheme fuses the base classifiers using fuzzy integral to facilitate modeling the relations between the base classifiers. The proposed algorithm is evaluated on six imbalanced large data sets and compared with state-of-the-art undersampling and ensemble methods, including the synthetic minority oversampling technique bagging (SMOTE-Bagging), SMOTE-Boost, and Binary Ensemble Classification for Imbalanced big data based on MapReduce and Upper sampling (BECIMU). Quantitative evaluations and theoretical analysis demonstrate that the proposed method outperforms the three state-of-the-art methods by 1.47%, 2.00% and 2.03%, and by 3.15%, 2.15% and 2.52%, in terms of the average G-mean and AUC-area, respectively.
- Subjects
DATABASES; DATA reduction; FUZZY integrals; BIG data; MACHINE learning
- Publication
Soft Computing - A Fusion of Foundations, Methodologies & Applications, 2022, Vol 26, Issue 6, p2781
- ISSN
1432-7643
- Publication type
Article
- DOI
10.1007/s00500-021-06654-9