Analyzing Quantum Feature Engineering and Balancing Strategies Effect on Liver Disease Classification
DOI:
https://doi.org/10.62411/faith.2024-12Keywords:
Data augmentation, Feature selection, Hepatitis classification, Preprocessing effect, Quantum Feature EngineeringAbstract
This research aims to improve the accuracy of liver disease classification using Quantum Feature Engineering (QFE) and the Synthetic Minority Over-sampling Tech-nique and Tomek Links (SMOTE-Tomek) data balancing technique. Four machine learning models were compared in this research, namely eXtreme Gradient Boosting (XGB), Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR) on the Indian Liver Patient Dataset (ILPD) dataset. QFE is applied to capture correlations and complex patterns in the data, while SMOTE-Tomek is used to address data imbalances. The results showed that QFE significantly improved LR performance in terms of recall and specificity up to 99%, which is very important in medical diagnosis. The combination of QFE and SMOTE-Tomek gives the best results for the XGB method with an accuracy of 81%, recall of 90%, and f1-score of 83%. This study concludes that the use of QFE and data balancing techniques can improve liver disease classification performance in general.
Downloads
References
WHO, “Hepatitis,” who.int, 2024. https://www.who.int/health-topics/hepatitis#tab=tab_1 (accessed May 19, 2024).
B. V. Ramana, P. M. Surendra, P. Babu, and P. N. B. Venkateswarlu, “A Critical Comparative Study of Liver Patients from USA and INDIA: An Exploratory Analysis,” Int. J. Comput. Sci. Issues, vol. 9, no. 3, pp. 506–516, 2012.
I. Straw and H. Wu, “Investigating for bias in healthcare algorithms: a sex-stratified analysis of supervised machine learning models in liver disease prediction,” BMJ Heal. Care Informatics, vol. 29, no. 1, p. e100457, Apr. 2022, doi: 10.1136/bmjhci-2021-100457.
M. B. Butt et al., “Diagnosing the Stage of Hepatitis C Using Machine Learning,” J. Healthc. Eng., vol. 2021, pp. 1–8, Dec. 2021, doi: 10.1155/2021/8062410.
H. Mamdouh Farghaly, M. Y. Shams, and T. Abd El-Hafeez, “Hepatitis C Virus prediction based on machine learning framework: a real-world case study in Egypt,” Knowl. Inf. Syst., vol. 65, no. 6, pp. 2595–2617, Jun. 2023, doi: 10.1007/s10115-023-01851-4.
F. Mustofa, A. N. Safriandono, A. R. Muslikh, and D. R. I. M. Setiadi, “Dataset and Feature Analysis for Diabetes Mellitus Classification using Random Forest,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 41–48, Jan. 2023, doi: 10.33633/jcta.v1i1.9190.
H. Ding, M. Fawad, X. Xu, and B. Hu, “A framework for identification and classification of liver diseases based on machine learning algorithms,” Front. Oncol., vol. 12, no. October, pp. 1–7, Oct. 2022, doi: 10.3389/fonc.2022.1048348.
T. R. Noviandy, K. Nisa, G. M. Idroes, I. Hardi, and N. R. Sasmita, “Classifying Beta-Secretase 1 Inhibitor Activity for Alzheimer’s Drug Discovery with LightGBM,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 358–367, Mar. 2024, doi: 10.62411/jcta.10129.
V. K. Yarasuri, G. K. Indukuri, and A. K. Nair, “Prediction of Hepatitis Disease Using Machine Learning Technique,” in 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Dec. 2019, pp. 265–269. doi: 10.1109/I-SMAC47947.2019.9032585.
F. S. Gomiasti, W. Warto, E. Kartikadarma, J. Gondohanindijo, and D. R. I. M. Setiadi, “Enhancing Lung Cancer Classification Effectiveness Through Hyperparameter-Tuned Support Vector Machine,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 396–406, Mar. 2024, doi: 10.62411/jcta.10106.
K. R. Singh, R. Gupta, R. K. Kadian, and R. Singh, “An Optimized XGBoost approach for Predicting Progression of Hepatitis C using Hyperparameter Tuning and Feature Interaction Constraint,” in 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), Aug. 2022, pp. 1–8. doi: 10.1109/ASIANCON55314.2022.9909086.
L. Ma, Y. Yang, X. Ge, Y. Wan, and X. Sang, “Prediction of disease progression of chronic hepatitis C based on XGBoost algorithm,” in 2020 International Conference on Robots & Intelligent System (ICRIS), Nov. 2020, pp. 598–601. doi: 10.1109/ICRIS52159.2020.00151.
M. S. P. Babu, M. Ramjee, S. Katta, and Swapna K, “Implementation of partitional clustering on ILPD dataset to predict liver disorders,” in 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), Aug. 2016, vol. 0, pp. 1094–1097. doi: 10.1109/ICSESS.2016.7883256.
M. A. Araaf, K. Nugroho, and D. R. I. M. Setiadi, “Comprehensive Analysis and Classification of Skin Diseases based on Image Texture Features using K-Nearest Neighbors Algorithm,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 31–40, Sep. 2023, doi: 10.33633/jcta.v1i1.9185.
UCI Machine Learning, “Hepatitis,” 1988. https://archive.ics.uci.edu/dataset/46/hepatitis
F. Omoruwou, A. A. Ojugo, and S. E. Ilodigwe, “Strategic Feature Selection for Enhanced Scorch Prediction in Flexible Polyurethane Form Manufacturing,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 346–357, Feb. 2024, doi: 10.62411/jcta.9539.
E. B. Wijayanti, D. R. I. M. Setiadi, and B. H. Setyoko, “Dataset Analysis and Feature Characteristics to Predict Rice Production based on eXtreme Gradient Boosting,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 299–310, Feb. 2024, doi: 10.62411/jcta.10057.
D. R. I. M. Setiadi, K. Nugroho, A. R. Muslikh, S. Wahyu, and A. A. Ojugo, “Integrating SMOTE-Tomek and Fusion Learning with XGBoost Meta-Learner for Robust Diabetes Recognition,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 23–38, 2024.
Y. Zhang and Z. Wang, “Feature Engineering and Model Optimization Based Classification Method for Network Intrusion Detection,” Appl. Sci., vol. 13, no. 16, p. 9363, Aug. 2023, doi: 10.3390/app13169363.
M. Oudah and A. Henschel, “Taxonomy-aware feature engineering for microbiome classification,” BMC Bioinformatics, vol. 19, no. 1, p. 227, Dec. 2018, doi: 10.1186/s12859-018-2205-3.
D. Dai et al., “Using machine learning and feature engineering to characterize limited material datasets of high-entropy alloys,” Comput. Mater. Sci., vol. 175, p. 109618, Apr. 2020, doi: 10.1016/j.commatsci.2020.109618.
M. Aamir and S. M. A. Zaidi, “DDoS attack detection with feature engineering and machine learning: the framework and performance evaluation,” Int. J. Inf. Secur., vol. 18, no. 6, pp. 761–785, Dec. 2019, doi: 10.1007/s10207-019-00434-1.
J. Singh, S. Bagga, and R. Kaur, “Software-based Prediction of Liver Disease with Feature Selection and Classification Techniques,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 1970–1980, 2020, doi: 10.1016/j.procs.2020.03.226.
A. Zheng, Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. Sebastopol, CA: O’Reilly Media, 2018.
F. Alharbi and A. Vakanski, “Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review,” Bioengineering, vol. 10, no. 2, p. 173, Jan. 2023, doi: 10.3390/bioengineering10020173.
T. Li, G. Kou, and Y. Peng, “Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods,” Inf. Syst., vol. 91, p. 101494, Jul. 2020, doi: 10.1016/j.is.2020.101494.
J. Zhang, P. Cheng, Z. Li, H. Wu, W. An, and J. Zhou, “A feature engineering method for machine learning inspired by quantum mechanics,” in 2023 International Joint Conference on Neural Networks (IJCNN), Jun. 2023, pp. 1–8. doi: 10.1109/IJCNN54540.2023.10192017.
J. M. Noshay et al., “Quantum biological insights into CRISPR-Cas9 sgRNA efficiency from explainable-AI driven feature engineering,” Nucleic Acids Res., vol. 51, no. 19, pp. 10147–10161, Oct. 2023, doi: 10.1093/nar/gkad736.
R. R. Shivwanshi and N. Nirala, “Quantum-enhanced hybrid feature engineering in thoracic CT image analysis for state-of-the-art nodule classification: an advanced lung cancer assessment,” Biomed. Phys. Eng. Express, vol. 10, no. 4, p. 045005, Jul. 2024, doi: 10.1088/2057-1976/ad4360.
F. M. Firnando, D. R. I. M. Setiadi, A. R. Muslikh, and Syahroni Wahyu Iriananda, “Analyzing InceptionV3 and InceptionResNetV2 with Data Augmentation for Rice Leaf Disease Classification,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 1–11, 2024.
F. O. Aghware et al., “Enhancing the Random Forest Model via Synthetic Minority Oversampling Technique for Credit-Card Fraud Detection,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 407–420, Mar. 2024, doi: 10.62411/jcta.10323.
P. Sundarreson and S. Kumarapathirage, “SentiGEN: Synthetic Data Generator for Sentiment Analysis,” J. Comput. Theor. Appl., vol. 1, no. 4, pp. 461–477, Apr. 2024, doi: 10.62411/jcta.10480.
Q. Ning, X. Zhao, and Z. Ma, “A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 19, no. 5, pp. 2632–2641, 2022, doi: 10.1109/TCBB.2021.3095482.
B. V. Ramana and R. S. Kumar Boddu, “Performance Comparison of Classification Algorithms on Medical Datasets,” in 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Jan. 2019, pp. 0140–0145. doi: 10.1109/CCWC.2019.8666497.
P. Kumar and R. S. Thakur, “Liver disorder detection using variable- neighbor weighted fuzzy K nearest neighbor approach,” Multimed. Tools Appl., vol. 80, no. 11, pp. 16515–16535, May 2021, doi: 10.1007/s11042-019-07978-3.
X. Li, D. Li, Y. Deng, and J. Xing, “Intelligent mining algorithm for complex medical data based on deep learning,” J. Ambient Intell. Humaniz. Comput., vol. 12, no. 2, pp. 1667–1678, Feb. 2021, doi: 10.1007/s12652-020-02239-w.
C. Li, S. Zhang, and D. Wang, “Specific Data Mining Model of Massive Health Data,” in Social Computing, 2016, pp. 632–640. doi: 10.1007/978-981-10-2053-7_56.
S. Scardapane, R. Altilio, V. Ciccarelli, A. Uncini, and M. Panella, “Privacy-Preserving Data Mining for Distributed Medical Scenarios,” in Multidisciplinary Approaches to Neural Computing, 2018, pp. 119–128. doi: 10.1007/978-3-319-56904-8_12.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Achmad Nuruddin Safriandono, De Rosal Ignatius Moses Setiadi, Akhmad Dahlan, Farah Zakiyah Rahmanti, Iwan Setiawan Wibisono, Arnold Adimabua Ojugo

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.