Integrating SMOTE-Tomek and Fusion Learning with XGBoost Meta-Learner for Robust Diabetes Recognition
DOI:
https://doi.org/10.62411/faith.2024-11Keywords:
Diabetes Classification, Ensemble Learning, XGBoost Meta-Learner, SMOTE-Tomek, Deep Learning in HealthcareAbstract
This research aims to develop a robust diabetes classification method by integrating the Synthetic Minority Over-sampling Technique (SMOTE)-Tomek technique for data balancing and using a machine learning ensemble led by eXtreme Gradient Boosting (XGB) as a meta-learner. We propose an ensemble model that combines deep learning techniques such as Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Gated Recurrent Units (BiGRU) with XGB classifier as the base learner. The data used included the Pima Indians Diabetes and Iraqi Society Diabetes datasets, which were processed by missing value handling, duplication, normalization, and the application of SMOTE-Tomek to resolve data imbalances. XGB, as a meta-learner, successfully improves the model's predictive ability by reducing bias and variance, resulting in more accurate and robust classification. The proposed ensemble model achieves perfect accuracy, precision, recall, specificity, and F1 score of 100% on all tested datasets. This method shows that combining ensemble learning techniques with a rigorous preprocessing approach can significantly improve diabetes classification performance.
Downloads
References
C. C. Olisah, L. Smith, and M. Smith, “Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective,” Comput. Methods Programs Biomed., vol. 220, p. 106773, 2022, doi: 10.1016/j.cmpb.2022.106773.
M. S. Reza, R. Amin, R. Yasmin, W. Kulsum, and S. Ruhi, “Improving diabetes disease patients classification using stacking ensemble method with PIMA and local healthcare data,” Heliyon, vol. 10, no. 2, p. e24536, 2024, doi: 10.1016/j.heliyon.2024.e24536.
F. Mustofa, A. N. Safriandono, A. R. Muslikh, and D. R. I. M. Setiadi, “Dataset and Feature Analysis for Diabetes Mellitus Classification using Random Forest,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 41–48, Jan. 2023, doi: 10.33633/jcta.v1i1.9190.
V. Chang, J. Bailey, Q. A. Xu, and Z. Sun, “Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms,” Neural Comput. Appl., vol. 35, no. 22, pp. 16157–16173, Aug. 2023, doi: 10.1007/s00521-022-07049-z.
M. Zhao, J. Wan, W. Qin, X. Huang, G. Chen, and X. Zhao, “A machine learning-based diagnosis modelling of type 2 diabetes mellitus with environmental metal exposure,” Comput. Methods Programs Biomed., vol. 235, p. 107537, Jun. 2023, doi: 10.1016/j.cmpb.2023.107537.
L. Wang, Z. Pan, W. Liu, J. Wang, L. Ji, and D. Shi, “A dual-attention based coupling network for diabetes classification with heterogeneous data,” J. Biomed. Inform., vol. 139, no. July 2022, p. 104300, 2023, doi: 10.1016/j.jbi.2023.104300.
A. Rashid, “Diabetes Dataset.” Mendeley Data, 2020. doi: 10.17632/wj9rwkp9c2.1.
“Abelvikas Diabetes Dataset.” https://data.world/abelvikas/diabetes-type-dataset (accessed Dec. 12, 2020).
UCI Machine Learning, “Pima Indians Diabetes Database,” Kaggle.com. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. S. Johannes, “Using the ADAP learning algorithm to forecast the onset of diabetes mellitus,” Proc. - Annu. Symp. Comput. Appl. Med. Care, pp. 261–265, 1988.
O. Adigun, F. Okikiola, N. Yekini, and R. Babatunde, “Classification of Diabetes Types using Machine Learning,” Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 9, pp. 152–161, 2022, doi: 10.14569/IJACSA.2022.0130918.
X. Li, M. Curiger, R. Dornberger, and T. Hanne, “Optimized computational diabetes prediction with feature selection algorithms,” ACM Int. Conf. Proceeding Ser., no. Ml, pp. 36–43, 2023, doi: 10.1145/3596947.3596948.
Q. Wang, W. Cao, J. Guo, J. Ren, Y. Cheng, and D. N. Davis, “DMP_MI: An Effective Diabetes Mellitus Classification Algorithm on Imbalanced Data With Missing Values,” IEEE Access, vol. 7, pp. 102232–102238, 2019, doi: 10.1109/ACCESS.2019.2929866.
N. Pradhan, G. Rani, V. S. Dhaka, and R. C. Poonia, “Diabetes prediction using artificial neural network,” in Deep Learning Techniques for Biomedical and Health Informatics, Elsevier, 2020, pp. 327–339. doi: 10.1016/B978-0-12-819061-6.00014-8.
Asniar, N. U. Maulidevi, and K. Surendro, “SMOTE-LOF for noise identification in imbalanced data classification,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 6, pp. 3413–3423, Jun. 2022, doi: 10.1016/j.jksuci.2021.01.014.
I. Tasin, T. U. Nabil, S. Islam, and R. Khan, “Diabetes prediction using machine learning and explainable AI techniques,” Healthc. Technol. Lett., vol. 10, no. 1–2, pp. 1–10, Feb. 2023, doi: 10.1049/htl2.12039.
E. Pekel Özmen and T. Özcan, “Diagnosis of diabetes mellitus using artificial neural network and classification and regression tree optimized with genetic algorithm,” J. Forecast., vol. 39, no. 4, pp. 661–670, Jul. 2020, doi: 10.1002/for.2652.
M. A. Araaf, K. Nugroho, and D. R. I. M. Setiadi, “Comprehensive Analysis and Classification of Skin Diseases based on Image Texture Features using K-Nearest Neighbors Algorithm,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 31–40, Sep. 2023, doi: 10.33633/jcta.v1i1.9185.
S. Ali, A. Hashmi, A. Hamza, U. Hayat, and H. Younis, “Dynamic and Static Handwriting Assessment in Parkinson’s Disease: A Synergistic Approach with C-Bi-GRU and VGG19,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 151–162, Dec. 2023, doi: 10.33633/jcta.v1i2.9469.
A. Yazdizadeh, Z. Patterson, and B. Farooq, “Ensemble Convolutional Neural Networks for Mode Inference in Smartphone Travel Survey,” IEEE Trans. Intell. Transp. Syst., vol. 21, no. 6, pp. 2232–2239, 2020, doi: 10.1109/TITS.2019.2918923.
K. Jhang, “Voting and ensemble schemes based on CNN models for photo-based gender prediction,” J. Inf. Process. Syst., vol. 16, no. 4, pp. 809–819, 2020, doi: 10.3745/JIPS.02.0137.
A. Manconi, G. Armano, M. Gnocchi, and L. Milanesi, “A Soft-Voting Ensemble Classifier for Detecting Patients Affected by COVID-19,” Appl. Sci., vol. 12, no. 15, 2022, doi: 10.3390/app12157554.
H. Qi, X. Song, S. Liu, Y. Zhang, and K. K. L. Wong, “KFPredict: An ensemble learning prediction framework for diabetes based on fusion of key features,” Comput. Methods Programs Biomed., vol. 231, p. 107378, Apr. 2023, doi: 10.1016/j.cmpb.2023.107378.
O. Jaiyeoba, E. Ogbuju, O. T. Yomi, and F. Oladipo, “Development of a Model to Classify Skin Diseases using Stacking Ensemble Machine Learning Techniques,” J. Comput. Theor. Appl., vol. 2, no. 1, pp. 22–38, May 2024, doi: 10.62411/jcta.10488.
T. R. Noviandy, K. Nisa, G. M. Idroes, I. Hardi, and N. R. Sasmita, “Classifying Beta-Secretase 1 Inhibitor Activity for Alzheimer’s Drug Discovery with LightGBM,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 358–367, Mar. 2024, doi: 10.62411/jcta.10129.
F. Omoruwou, A. A. Ojugo, and S. E. Ilodigwe, “Strategic Feature Selection for Enhanced Scorch Prediction in Flexible Polyurethane Form Manufacturing,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 346–357, Feb. 2024, doi: 10.62411/jcta.9539.
F. S. Gomiasti, W. Warto, E. Kartikadarma, J. Gondohanindijo, and D. R. I. M. Setiadi, “Enhancing Lung Cancer Classification Effectiveness Through Hyperparameter-Tuned Support Vector Machine,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 396–406, Mar. 2024, doi: 10.62411/jcta.10106.
Z. Yishun, W. Guoyue, L. Yi, M. Yige, and W. Jiangwei, “Classification of Distribution Network Planning Documents Based on LSTM Neural Network,” Procedia Comput. Sci., vol. 228, pp. 914–919, 2023, doi: 10.1016/j.procs.2023.11.120.
S. Boda, M. Mahadevappa, and P. Kumar Dutta, “An automated patient-specific ECG beat classification using LSTM-based recurrent neural networks,” Biomed. Signal Process. Control, vol. 84, no. February 2022, p. 104756, Jul. 2023, doi: 10.1016/j.bspc.2023.104756.
N. N. Wijaya, D. R. I. M. Setiadi, and A. R. Muslikh, “Music-Genre Classification using Bidirectional Long Short-Term Memory and Mel-Frequency Cepstral Coefficients,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 243–256, Jan. 2024, doi: 10.62411/jcta.9655.
J. Bi, Z. Guan, H. Yuan, and J. Zhang, “Improved network intrusion classification with attention-assisted bidirectional LSTM and optimized sparse contractive autoencoders,” Expert Syst. Appl., vol. 244, no. December 2023, p. 122966, Jun. 2024, doi: 10.1016/j.eswa.2023.122966.
Y. Lu, X. Wu, P. Liu, H. Li, and W. Liu, “Rice disease identification method based on improved CNN-BiGRU,” Artif. Intell. Agric., vol. 9, pp. 100–109, Sep. 2023, doi: 10.1016/j.aiia.2023.08.005.
M. Diaz, M. Moetesum, I. Siddiqi, and G. Vessio, “Sequence-based dynamic handwriting analysis for Parkinson’s disease detection with one-dimensional convolutions and BiGRUs,” Expert Syst. Appl., vol. 168, no. August 2020, p. 114405, Apr. 2021, doi: 10.1016/j.eswa.2020.114405.
M. R. Abbasniya, S. A. Sheikholeslamzadeh, H. Nasiri, and S. Emami, “Classification of Breast Tumors Based on Histopathology Images Using Deep Features and Ensemble of Gradient Boosting Methods,” Comput. Electr. Eng., vol. 103, no. 1, p. 108382, Jan. 2022, doi: 10.1016/j.compeleceng.2022.108382.
J. A. ALzubi, B. Bharathikannan, S. Tanwar, R. Manikandan, A. Khanna, and C. Thaventhiran, “Boosted neural network ensemble classification for lung cancer disease diagnosis,” Appl. Soft Comput., vol. 80, pp. 579–591, Jul. 2019, doi: 10.1016/j.asoc.2019.04.031.
E. B. Wijayanti, D. R. I. M. Setiadi, and B. H. Setyoko, “Dataset Analysis and Feature Characteristics to Predict Rice Production based on eXtreme Gradient Boosting,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 299–310, Feb. 2024, doi: 10.62411/jcta.10057.
H. Naz and S. Ahuja, “Deep learning approach for diabetes prediction using PIMA Indian dataset,” J. Diabetes Metab. Disord., vol. 19, no. 1, pp. 391–403, 2020, doi: 10.1007/s40200-020-00520-5.
A. Tuppad and S. Devi Patil, “An efficient classification framework for Type 2 Diabetes incorporating feature interactions,” Expert Syst. Appl., vol. 239, no. April 2023, p. 122138, 2024, doi: 10.1016/j.eswa.2023.122138.
L. P. Joseph, E. A. Joseph, and R. Prasad, “Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture,” Comput. Biol. Med., vol. 151, no. PA, p. 106178, 2022, doi: 10.1016/j.compbiomed.2022.106178.
Muljono, S. A. Wulandari, H. Al Azies, M. Naufal, W. A. Prasetyanto, and F. A. Zahra, “Breaking Boundaries in Diagnosis: Non-Invasive Anemia Detection Empowered by AI,” IEEE Access, vol. 12, pp. 9292–9307, Jan. 2024, doi: 10.1109/ACCESS.2024.3353788.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 De Rosal Ignatius Moses Setiadi, Kristiawan Nugroho, Ahmad Rofiqul Muslikh, Syahroni Wahyu Iriananda, Arnold Adimabua Ojugo

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.