Phishing Website Detection Using Bidirectional Gated Recurrent Unit Model and Feature Selection

Authors

DOI:

https://doi.org/10.62411/faith.2024-15

Keywords:

BiGRU, Cyber attack, Cyber security, Phishing detection, Website phishing classification

Abstract

Phishing attacks continue to be a significant threat to internet users, necessitating the development of advanced detection systems. This study explores the efficacy of a Bidirectional Gated Recurrent Unit (BiGRU) model combined with feature selection techniques for detecting phishing websites. The dataset used for this research is sourced from the UCI Machine Learning Repository, specifically the Phishing Websites dataset. This approach involves cleaning and preprocessing the data, then normalizing features and employing feature selection to identify the most relevant attributes for classification. The BiGRU model, known for its ability to capture temporal dependencies in data, is then applied. To ensure robust evaluation, we utilized cross-validation, dividing the data into five folds. The experimental results are highly promising, demonstrating a Mean Accuracy, Mean Precision, Mean Recall, Mean F1 Score, and Mean AUC of 1.0. These results indicate the model's exceptional performance distinguishing between phishing and legitimate websites. This study highlights the potential of combining BiGRU models with feature selection and cross-validation to create highly accurate phishing detection systems, providing a reliable solution to enhance cybersecurity measures.

Downloads

Download data is not yet available.

Author Biographies

Suyud Widiono, University of Technology Yogyakarta

Department of Computer Engineering, Faculty of Science and Technology, University of Technology Yogyakarta, Indonesia

Achmad Nuruddin Safriandono, Sultan Fatah University

Faculty of Engineering, Sultan Fatah University, Demak, Central Java 59516, Indonesia

Setyo Budi, Dian Nuswantoro University

Information System Department, Faculty of Computer Science, Dian Nuswantoro University, Indonesia

References

J. K. Oladele et al., “BEHeDaS: A Blockchain Electronic Health Data System for Secure Medical Records Exchange,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 231–242, Jan. 2024, doi: 10.62411/jcta.9509.

E. A. L. Marazqah Btoush, X. Zhou, R. Gururajan, K. C. Chan, R. Genrich, and P. Sankaran, “A systematic review of literature on credit card cyber fraud detection using machine and deep learning,” PeerJ Comput. Sci., vol. 9, p. e1278, Apr. 2023, doi: 10.7717/peerj-cs.1278.

S. Tanwar, T. Paul, K. Singh, M. Joshi, and A. Rana, “Classification and Imapct of Cyber Threats in India: A review,” in 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Jun. 2020, pp. 129–135. doi: 10.1109/ICRITO48877.2020.9198024.

CrowdStrike, “CrowdStrike 2024 Global Threat Report,” 2024. Accessed: Jul. 10, 2024. [Online]. Available: https://www.crowdstrike.com/global-threat-report/

I. Saha, D. Sarma, R. J. Chakma, M. N. Alam, A. Sultana, and S. Hossain, “Phishing Attacks Detection using Deep Learning Approach,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Aug. 2020, no. Icssit, pp. 1180–1185. doi: 10.1109/ICSSIT48917.2020.9214132.

C. Catal, G. Giray, B. Tekinerdogan, S. Kumar, and S. Shukla, “Applications of deep learning for phishing detection: a systematic literature review,” Knowl. Inf. Syst., vol. 64, no. 6, pp. 1457–1500, Jun. 2022, doi: 10.1007/s10115-022-01672-x.

R. Alazaidah, A. Al-Shaikh, M. R. Al-Mousa, H. Khafajah, G. Samara, and M. Alzyoud, “Website Phishing Detection Using Machine Learning Techniques,” J. Stat. Appl. Probab., vol. 13, no. 1, pp. 119–129, Jan. 2024, doi: 10.18576/jsap/130108.

N. Q. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma, and H. Fujita, “Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions,” IEEE Access, vol. 10, pp. 36429–36463, 2022, doi: 10.1109/ACCESS.2022.3151903.

A. El Aassal, S. Baki, A. Das, and R. M. Verma, “An In-Depth Benchmarking and Evaluation of Phishing Detection Research for Security Needs,” IEEE Access, vol. 8, pp. 22170–22192, 2020, doi: 10.1109/ACCESS.2020.2969780.

A. A. Ojugo and A. O. Eboka, “Comparative Evaluation for High Intelligent Performance Adaptive Model for Spam Phishing Detection,” vol. 3, no. 1, pp. 9–15, Nov. 2018, Accessed: Dec. 21, 2023. [Online]. Available: http://pubs.sciepub.com/dt/3/1/2/index.html

S. Shabudin, N. S. Sani, K. A. Z. Ariffin, and M. Aliff, “Feature selection for phishing website classification,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 4, pp. 587–595, 2020, doi: 10.14569/IJACSA.2020.0110477.

M. N. Alam, D. Sarma, F. F. Lima, I. Saha, R. E. Ulfath, and S. Hossain, “Phishing attacks detection using machine learning approach,” Proc. 3rd Int. Conf. Smart Syst. Inven. Technol. ICSSIT 2020, no. Icssit, pp. 1173–1179, 2020, doi: 10.1109/ICSSIT48917.2020.9214225.

S. Alnemari and M. Alshammari, “Detecting Phishing Domains Using Machine Learning,” Appl. Sci., vol. 13, no. 8, p. 4649, Apr. 2023, doi: 10.3390/app13084649.

J. Kumar, A. Santhanavijayan, B. Janet, B. Rajendran, and B. S. Bindhumadhava, “Phishing Website Classification and Detection Using Machine Learning,” in 2020 International Conference on Computer Communication and Informatics (ICCCI), Jan. 2020, pp. 1–6. doi: 10.1109/ICCCI48352.2020.9104161.

S. A. Khan, W. Khan, and A. Hussain, “Phishing Attacks and Websites Classification Using Machine Learning and Multiple Datasets (A Comparative Analysis),” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12465 LNAI, 2020, pp. 301–313. doi: 10.1007/978-3-030-60796-8_26.

W. Sarasjati et al., “Comparative Study of Classification Algorithms for Website Phishing Detection on Multiple Datasets,” in 2022 International Seminar on Application for Technology of Information and Communication (iSemantic), Sep. 2022, pp. 448–452. doi: 10.1109/iSemantic55962.2022.9920475.

Y. Muliono, M. A. Ma’ruf, and Z. M. Azzahra, “Phishing Site Detection Classification Model Using Machine Learning Approach,” Eng. Math. Comput. Sci. J., vol. 5, no. 2, pp. 63–67, May 2023, doi: 10.21512/emacsjournal.v5i2.9951.

A. Mughaid, S. AlZu’bi, A. Hnaif, S. Taamneh, A. Alnajjar, and E. A. Elsoud, “An intelligent cyber security phishing detection system using deep learning techniques,” Cluster Comput., vol. 25, no. 6, pp. 3819–3828, Dec. 2022, doi: 10.1007/s10586-022-03604-4.

B. M. P. Waseso and N. A. Setiyanto, “Web Phishing Classification using Combined Machine Learning Methods,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 11–18, Aug. 2023, doi: 10.33633/jcta.v1i1.8898.

A. K. Dutta, “Detecting phishing websites using machine learning technique,” PLoS One, vol. 16, no. 10, p. e0258361, Oct. 2021, doi: 10.1371/journal.pone.0258361.

N. N. Wijaya, D. R. I. M. Setiadi, and A. R. Muslikh, “Music-Genre Classification using Bidirectional Long Short-Term Memory and Mel-Frequency Cepstral Coefficients,” J. Comput. Theor. Appl., vol. 1, no. 3, pp. 243–256, Jan. 2024, doi: 10.62411/jcta.9655.

M. A. Adebowale, K. T. Lwin, and M. A. Hossain, “Intelligent phishing detection scheme using deep learning algorithms,” J. Enterp. Inf. Manag., vol. 36, no. 3, pp. 747–766, Apr. 2023, doi: 10.1108/JEIM-01-2020-0036.

A. Aljofey, Q. Jiang, Q. Qu, M. Huang, and J.-P. Niyigena, “An Effective Phishing Detection Model Based on Character Level Convolutional Neural Network from URL,” Electronics, vol. 9, no. 9, p. 1514, Sep. 2020, doi: 10.3390/electronics9091514.

S. Y. Yerima and M. K. Alzaylaee, “High Accuracy Phishing Detection Based on Convolutional Neural Networks,” in 2020 3rd International Conference on Computer Applications & Information Security (ICCAIS), Mar. 2020, pp. 1–6. doi: 10.1109/ICCAIS48893.2020.9096869.

Z. Alshingiti, R. Alaqel, J. Al-Muhtadi, Q. E. U. Haq, K. Saleem, and M. H. Faheem, “A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN,” Electronics, vol. 12, no. 1, p. 232, Jan. 2023, doi: 10.3390/electronics12010232.

M. F. Khan and B. L. Rana, “Detection of Phishing Websites Using Deep Learning Techniques,” Turkish J. Comput. Math. Educ., vol. 12, no. 10, pp. 3880–3892, 2021.

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv. Dec. 11, 2014. [Online]. Available: http://arxiv.org/abs/1412.3555

D. R. I. M. Setiadi, K. Nugroho, A. R. Muslikh, S. W. Iriananda, and A. A. Ojugo, “Integrating SMOTE-Tomek and Fusion Learning with XGBoost Meta-Learner for Robust Diabetes Recognition,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 23–38, May 2024, doi: 10.62411/faith.2024-11.

D. R. I. M. Setiadi, H. M. M. Islam, G. A. Trisnapradika, and W. Herowati, “Analyzing Preprocessing Impact on Machine Learning Classifiers for Cryotherapy and Immunotherapy Dataset,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 39–50, Jun. 2024, doi: 10.62411/faith.2024-2.

S. Ali, A. Hashmi, A. Hamza, U. Hayat, and H. Younis, “Dynamic and Static Handwriting Assessment in Parkinson’s Disease: A Synergistic Approach with C-Bi-GRU and VGG19,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 151–162, Dec. 2023, doi: 10.33633/jcta.v1i2.9469.

A. A. Ubing, S. K. B. Jasmi, A. Abdullah, N. Z. Jhanjhi, and M. Supramaniam, “Phishing website detection: An improved accuracy through feature selection and ensemble learning,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 1, pp. 252–257, 2019, doi: 10.14569/IJACSA.2019.0100133.

L. Lakshmi, M. P. Reddy, C. Santhaiah, and U. J. Reddy, “Smart Phishing Detection in Web Pages using Supervised Deep Learning Classification and Optimization Technique ADAM,” Wirel. Pers. Commun., vol. 118, no. 4, pp. 3549–3564, 2021, doi: 10.1007/s11277-021-08196-7.

Y. A. Alsariera, A. V. Elijah, and A. O. Balogun, “Phishing Website Detection: Forest by Penalizing Attributes Algorithm and Its Enhanced Variations,” Arab. J. Sci. Eng., vol. 45, no. 12, pp. 10459–10470, 2020, doi: 10.1007/s13369-020-04802-1.

M. I. Akazue, I. A. Debekeme, A. E. Edje, C. Asuai, and U. J. Osame, “UNMASKING FRAUDSTERS: Ensemble Features Selection to Enhance Random Forest Fraud Detection,” J. Comput. Theor. Appl., vol. 1, no. 2, pp. 201–211, Dec. 2023, doi: 10.33633/jcta.v1i2.9462.

M. S. Sunarjo, H. Gan, and D. R. I. M. Setiadi, “High-Performance Convolutional Neural Network Model to Identify COVID-19 in Medical Images,” J. Comput. Theor. Appl., vol. 1, no. 1, pp. 19–30, Aug. 2023, doi: 10.33633/jcta.v1i1.8936.

I. M. Zubair and B. Kim, “A Group Feature Ranking and Selection Method Based on Dimension Reduction Technique in High-Dimensional Data,” IEEE Access, vol. 10, pp. 125136–125147, 2022, doi: 10.1109/ACCESS.2022.3225685.

G. S. Thejas, R. Garg, S. S. Iyengar, N. R. Sunitha, P. Badrinath, and S. Chennupati, “Metric and Accuracy Ranked Feature Inclusion: Hybrids of Filter and Wrapper Feature Selection Approaches,” IEEE Access, vol. 9, pp. 128687–128701, 2021, doi: 10.1109/ACCESS.2021.3112169.

D. M. D. Raj and R. Mohanasundaram, “An Efficient Filter-Based Feature Selection Model to Identify Significant Features from High-Dimensional Microarray Data,” Arab. J. Sci. Eng., vol. 45, no. 4, pp. 2619–2630, Apr. 2020, doi: 10.1007/s13369-020-04380-2.

F. Masood, J. Masood, H. Zahir, K. Driss, N. Mehmood, and H. Farooq, “Novel Approach to Evaluate Classification Algorithms and Feature Selection Filter Algorithms Using Medical Data,” J. Comput. Cogn. Eng., vol. 2, no. 1, pp. 57–67, May 2022, doi: 10.47852/bonviewJCCE2202238.

O. Jaiyeoba, E. Ogbuju, O. T. Yomi, and F. Oladipo, “Development of a Model to Classify Skin Diseases using Stacking Ensemble Machine Learning Techniques,” J. Comput. Theor. Appl., vol. 2, no. 1, pp. 22–38, May 2024, doi: 10.62411/jcta.10488.

D. R. I. M. Setiadi, D. Marutho, and N. A. Setiyanto, “Comprehensive Exploration of Machine and Deep Learning Classification Methods for Aspect-Based Sentiment Analysis with Latent Dirichlet Allocation Topic Modeling,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 12–22, May 2024, doi: 10.62411/faith.2024-3.

F. M. Firnando, D. R. I. M. Setiadi, A. R. Muslikh, and S. W. Iriananda, “Analyzing InceptionV3 and InceptionResNetV2 with Data Augmentation for Rice Leaf Disease Classification,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 1–11, May 2024, doi: 10.62411/faith.2024-4.

K. Pham, D. Kim, S. Park, and H. Choi, “Ensemble learning-based classification models for slope stability analysis,” CATENA, vol. 196, p. 104886, Jan. 2021, doi: 10.1016/j.catena.2020.104886.

M. K. Pandey, M. K. Singh, S. Pal, and B. B. Tiwari, “Detection of Phishing Website Using Intelligent Machine Learning Classifiers,” in Soft Computing and Signal Processing, 2023, pp. 21–29. doi: 10.1007/978-981-19-8669-7_3.

S. Kapan and E. Sora Gunal, “Improved Phishing Attack Detection with Machine Learning: A Comprehensive Evaluation of Classifiers and Features,” Appl. Sci., vol. 13, no. 24, p. 13269, Dec. 2023, doi: 10.3390/app132413269.

T.-T. Wong and P.-Y. Yeh, “Reliable Accuracy Estimates from k -Fold Cross Validation,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 8, pp. 1586–1594, Aug. 2020, doi: 10.1109/TKDE.2019.2912815.

A. N. Safriandono, D. R. I. M. Setiadi, A. Dahlan, F. Z. Rahmanti, I. S. Wibisono, and A. A. Ojugo, “Analyzing Quantum Feature Engineering and Balancing Strategies Effect on Liver Disease Classification,” J. Futur. Artif. Intell. Technol., vol. 1, no. 1, pp. 51–63, Jun. 2024, doi: 10.62411/faith.2024-12.

Downloads

Published

2024-07-12

How to Cite

[1]
D. R. I. M. Setiadi, S. Widiono, A. N. Safriandono, and S. Budi, “Phishing Website Detection Using Bidirectional Gated Recurrent Unit Model and Feature Selection”, J. Fut. Artif. Intell. Tech., vol. 1, no. 2, pp. 75–83, Jul. 2024.

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)