EBSCO Logo
Connecting you to content on EBSCOhost
Results
Title

Ensemble Model for Multiclass Imbalanced Data Using Cluster Computing of Spark.

Authors

Khandekar, Varsha S.; Shrinath, Pravin

Abstract

Big data analysis using machine learning has become a challenging problem today. Classification problems become more challenging when class distribution is imbalanced. In this paper, we propose a distributed ensemble model with an intelligence technique based on Particle Swarm Optimization to overcome the imbalanced problem. For compensating the class imbalance, first SMOTE is used to balance the minority class samples, and then sampling based on Particle Swarm Optimization is applied. Here, to perform fast processing, the whole model is implemented using spark-cluster computing, which uses the underlying concept of parallel programming of spark RDD. Results of the proposed system have shown consistent improvements on several evaluation metrics and overall processing time. Evaluation of the proposed system has been done using different performance metrices also comparison between sequential and distributed ensemble models. Most of the existing techniques show different performances for different datasets, while the proposed method has shown better generalization property, which improves the data-model dependency issue. The proposed model has been evaluated using KDD-CUP'99 intrusion detection and insect sensor datasets. For the datasets, it shows better improvement over traditional sampling techniques. F-Measure value is 99% for KDD'cup dataset and 92% for insect dataset.

Subjects

BIG data; COMPUTER workstation clusters; PARTICLE swarm optimization

Publication

Ingénierie des Systèmes d'Information, 2023, Vol 28, Issue 1, p161

ISSN

1633-1311

Publication type

Academic Journal

DOI

10.18280/isi.280117

EBSCO Connect | Privacy policy | Terms of use | Copyright | Manage my cookies
Journals | Subjects | Sitemap
© 2025 EBSCO Industries, Inc. All rights reserved