We found a match
Your institution may have access to this item. Find your institution then sign in to continue.
- Title
Ensemble Model for Multiclass Imbalanced Data Using Cluster Computing of Spark.
- Authors
Khandekar, Varsha S.; Shrinath, Pravin
- Abstract
Big data analysis using machine learning has become a challenging problem today. Classification problems become more challenging when class distribution is imbalanced. In this paper, we propose a distributed ensemble model with an intelligence technique based on Particle Swarm Optimization to overcome the imbalanced problem. For compensating the class imbalance, first SMOTE is used to balance the minority class samples, and then sampling based on Particle Swarm Optimization is applied. Here, to perform fast processing, the whole model is implemented using spark-cluster computing, which uses the underlying concept of parallel programming of spark RDD. Results of the proposed system have shown consistent improvements on several evaluation metrics and overall processing time. Evaluation of the proposed system has been done using different performance metrices also comparison between sequential and distributed ensemble models. Most of the existing techniques show different performances for different datasets, while the proposed method has shown better generalization property, which improves the data-model dependency issue. The proposed model has been evaluated using KDD-CUP'99 intrusion detection and insect sensor datasets. For the datasets, it shows better improvement over traditional sampling techniques. F-Measure value is 99% for KDD'cup dataset and 92% for insect dataset.
- Subjects
BIG data; COMPUTER workstation clusters; PARTICLE swarm optimization
- Publication
Ingénierie des Systèmes d'Information, 2023, Vol 28, Issue 1, p161
- ISSN
1633-1311
- Publication type
Academic Journal
- DOI
10.18280/isi.280117