skip to main content

Implementation of the Ensemble Machine Learning Algorithm for Student Dropout Prediction Analysis

*Winarsih Winarsih  -  Doctoral Program of Information System, School of Post Graduate Studies, Diponegoro University, Jl. Imam Bardjo S.H., No. 5, Pleburan, Semarang, Indonesia 50241, Indonesia
Heri Sutanto  -  Department of Physics, Faculty of Science and Mathematics, Diponegoro University, Jl. Prof. Soedarto, S.H., Tembalang, Semarang, Indonesia 50275, Indonesia
Aris Puji Widodo  -  Department of Informatics, Faculty of Science and Mathematics, Diponegoro University, Jl. Prof. Soedarto, S.H., Tembalang, Semarang, Indonesia 50275, Indonesia
Open Access Copyright (c) 2025 Jurnal Sistem Informasi Bisnis

Citation Format:
Abstract

Educational Data Mining provides an effective approach to tackle numerous issues within the education sector, including the capacity to perform predictive analyses regarding student attrition based on academic information. In this research, data from the Open University Learning Analytics dataset (OULAD), which is publicly accessible, has been employed, which encompasses student information collected during online learning. We apply various Machine Learning models, including Decision Trees, Naïve Bayes, Logistic Regression, and ensemble approaches like Random Forest and AdaBoost. Among the models tested, Random Forest (RF) achieved the highest accuracy of 89.37%, along with a precision of 89.57% and a recall of 93.86%, using the data splitting approach. When employing an alternative evaluation model, specifically K-Fold Cross Validation, the maximum F1 score achieved was 9.45%. In summary, the ensemble machine learning algorithm, specifically Random Forest (RF), exhibited strong performance in predicting student academic achievement quality.

Fulltext View|Download
Keywords: Bi-criteria scheduling; Machine Learning; Student dropout prediction; Data Mining; Random Forest; Open University Learning Analytics (OULAD)

Article Metrics:

  1. Al-Zawqari, A., Peumans, D., & Vandersteen, G. (2022). A flexible feature selection approach for predicting students’ academic performance in online courses. Computers and Education: Artificial Intelligence, 3(November), 100103. https://doi.org/10.1016/j.caeai.2022.100103
  2. Alhothali, A., Albsisi, M., Assalahi, H., & Aldosemani, T. (2022). Predicting Student Outcomes in Online Courses Using Machine Learning Techniques: A Review. Sustainability (Switzerland), 14(10), 1–23. https://doi.org/10.3390/su14106199
  3. Bagunaid, W., Chilamkurti, N., & Veeraraghavan, P. (2022). AISAR: Artificial Intelligence-Based Student Assessment and Recommendation System for E-Learning in Big Data. Sustainability (Switzerland), 14(17). https://doi.org/10.3390/su141710551
  4. Barros, T. M., Neto, P. A. S., Silva, I., & Guedes, L. A. (2019). Predictive models for imbalanced data: A school dropout perspective. Education Sciences, 9(4). https://doi.org/10.3390/educsci9040275
  5. Daza Vergaray, A., Miranda, J. C. H., Cornelio, J. B., López Carranza, A. R., & Ponce Sánchez, C. F. (2023). Predicting the depression in university students using stacking ensemble techniques over oversampling method. Informatics in Medicine Unlocked, 41(June). https://doi.org/10.1016/j.imu.2023.101295
  6. Hameed, M., & Akhtar, N. (2021). Student Performance Prediction in Intelligent E-Learning for Tertiary Education How to Cite: Mustafa Hameed and Nadeem Akhtar (2021). Student Performance Prediction in Intelligent E-Learning for Tertiary Education. International Journal of Computational I. International Journal of Computational Intelligence in Control, 13(2), 293–299
  7. Ika Alfina, Rio Mulia, Mohamad Ivan Fanany, Y. E. (1999). Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study. 473–481
  8. Jawad, K., Shah, M. A., & Tahir, M. (2022). Students’ Academic Performance and Engagement Prediction in a Virtual Learning Environment Using Random Forest with Data Balancing. Sustainability (Switzerland), 14(22). https://doi.org/10.3390/su142214795
  9. Khanday, A. M. U. D., Rabani, S. T., Khan, Q. R., & Malik, S. H. (2022). Detecting twitter hate speech in COVID-19 era using machine learning and ensemble learning techniques. International Journal of Information Management Data Insights, 2(2), 100120. https://doi.org/10.1016/j.jjimei.2022.100120
  10. Mastour, H., Dehghani, T., Moradi, E., & Eslami, S. (2023). Early prediction of medical students’ performance in high-stakes examinations using machine learning approaches. Heliyon, 9(7), e18248. https://doi.org/10.1016/j.heliyon.2023.e18248
  11. Renò, V., Stella, E., Patruno, C., Capurso, A., Dimauro, G., & Maglietta, R. (2022). Learning Analytics: Analysis of Methods for Online Assessment. Applied Sciences (Switzerland), 12(18), 1–10. https://doi.org/10.3390/app12189296
  12. Rodríguez-Hernández, C. F., Musso, M., Kyndt, E., & Cascallar, E. (2021). Artificial neural networks in academic performance prediction: Systematic implementation and predictor evaluation. Computers and Education: Artificial Intelligence, 2(December 2020). https://doi.org/10.1016/j.caeai.2021.100018
  13. Sawangarreerak, S., & Thanathamathee, P. (2020). Random forest with sampling techniques for handling imbalanced prediction of university student depression. Information (Switzerland), 11(11), 1–13. https://doi.org/10.3390/info11110519
  14. Taamneh, M. M., Taamneh, S., Alomari, A. H., & Abuaddous, M. (2023). Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use. Sustainability (Switzerland), 15(13). https://doi.org/10.3390/su151310668
  15. Tsai, J. K., & Hung, C. H. (2021). Improving adaboost classifier to predict enterprise performance after covid-19. Mathematics, 9(18), 1–10. https://doi.org/10.3390/math9182215

Last update:

No citation recorded.

Last update: 2025-06-14 03:06:52

No citation recorded.