EBSCO Logo
Connecting you to content on EBSCOhost
Results
Title

AN IMPROVED DIABETES MELLITUS PREDICTION MODEL THROUGH ENSEMBLE LEARNING AND GINI INDEX-BASED FEATURE SELECTION.

Authors

Ibrahim, Rukkayya Yahaya; Yusuf, Sahabi A.; Abdullahi, Mohammed; Isuwa, Jeremiah

Abstract

Diabetes Mellitus (DM) is a condition where the body cannot regulate blood sugar due to improper insulin production or use, posing a significant global health burden. Traditional detection methods rely on clinical assessments and basic lab tests, but recent technological advancements suggest that Machine Learning (ML) algorithms can predict DM more effectively and efficiently. However, current ML models face challenges like feature redundancy, irrelevancy, and dataset imbalance, which can reduce accuracy and interpretability, ultimately affecting patient outcomes. This paper aims to address these challenges by developing an enhanced ML-based DM prediction model. The proposed model leverages an ensemble soft voting classifier, integrating the Random Forest, Logistic Regression, and Naïve Bayes algorithms. Feature importance determination is facilitated by the Gini Index Random Forest (GI-RF) algorithm. Additionally, three data imbalance handling techniques random oversampling (ROS), random undersampling (RUS), and the synthetic minority oversampling technique (SMOTE) are employed to mitigate biased model development. Initially, the GI-RF algorithm identifies the top 5 most informative features from the PIMA Indians Diabetes Dataset, originally comprising 8 features. Subsequently, the dataset is subjected to each of the three imbalance handling techniques. The performance of each model variation, incorporating different imbalance handling techniques is then extensively compared. The results demonstrate that ROS notably outperforms RUS and SMOTE across multiple metrics, including accuracy, F1 score, recall, and AUC. A comparative analysis with existing studies reveals the proposed method's notable improvements across all metrics, with increases of 5% in accuracy, 8% in precision, 13% in F1 score, 18% in recall, and 4% in AUC. This demonstrates the proposed model's overall robustness and effectiveness in predictive modeling, contributing to more accurate diagnosis and treatment of DM.

Subjects

ENSEMBLE learning; FEATURE selection; TECHNOLOGICAL innovations; VOTING machines; RANDOM forest algorithms

Publication

Science World Journal, 2024, Vol 19, Issue 4, p1260

ISSN

2756-391X

Publication type

Academic Journal

DOI

10.4314/swj.v19i4.48

EBSCO Connect | Privacy policy | Terms of use | Copyright | Manage my cookies
Journals | Subjects | Sitemap
© 2025 EBSCO Industries, Inc. All rights reserved