skip to main content

COMPARISON OF RANDOM FOREST AND SUPPORT VECTOR MACHINE CLASSIFICATION METHODS FOR PREDICTING THE ACCURACY LEVEL OF MADRASAH DATA

*Dodi Irawan Syarip orcid  -  Department of Statistics, IPB University, Kampus IPB, Jalan Meranti Wing 22 Level 4, Dramaga, Babakan, Kec. Dramaga, Kabupaten Bogor, Jawa Barat 16680, Indonesia
Khairil Anwar Notodiputro orcid scopus  -  Department of Statistics, IPB University, Kampus IPB, Jalan Meranti Wing 22 Level 4, Dramaga, Babakan, Kec. Dramaga, Kabupaten Bogor, Jawa Barat 16680, Indonesia
Bagus Sartono orcid scopus  -  Department of Statistics, IPB University, Kampus IPB, Jalan Meranti Wing 22 Level 4, Dramaga, Babakan, Kec. Dramaga, Kabupaten Bogor, Jawa Barat 16680, Indonesia
Open Access Copyright (c) 2025 MEDIA STATISTIKA under http://creativecommons.org/licenses/by-nc-sa/4.0.

Citation Format:
Abstract
This study aims to identify the most effective classification method for predicting the accuracy level of madrasah data with class imbalance. Two machine learning approaches were employed: Random Forest (RF) and Support Vector Machine (SVM). Based on the AUC values, it was concluded that the RF model had a slightly better performance in predicting the accuracy level of the madrasah data, with an average AUC of 62.82, compared to the SVM model, which had an average AUC of 62.33. Among all models, the highest and consistent performance was achieved by the RF model using ROSE techniques. The results of measuring variable importance showed that the predictor variables with the greatest influence in predicting the accuracy level of the madrasah data are the number of students and the student-to-teacher and staff ratio. This finding suggests that school principals and madrasah administrative staff should prioritize ensuring the completeness of student, teacher, and staff data to improve the overall reliability of madrasah data.

Note: This article has supplementary file(s).

Fulltext View|Download |  Data Set
Dataset of the Result of the 2023 Islamic Education Data Accuracy Audit Survey
Subject
Type Data Set
  Download (71KB)    Indexing metadata
Keywords: Accuracy; AUC; Random Forest; ROSE; SVM

Article Metrics:

  1. Awad, M., & Khanna, R. (2015). Support Vector Regression. In Efficient Learning Machines, pp. 67–80. Apress. https://doi.org/10.1007/978-1-4302-5990-9_4
  2. Babbie, E. R. (2020). The Practice of Social Research (15th ed.). USA: Cengage Learning
  3. Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for Data Quality Assessment and Improvement. ACM Computing Surveys, 41(3), 1–52. https://doi.org/10.1145/1541880.1541883
  4. Breiman, L. (2001). Random Forests. Machine Learning, 45(5), 5–32. https://doi.org/https://doi.org/10.1023/A:1010933404324
  5. Breiman, L., & Cutler, A. (2003). Manual on Setting Up, Using, and Understanding Random Forests V4.0. https://www.Stat.Berkeley.Edu/~breiman/Using_random_forests_v4.0.Pdf
  6. Caruana, R., & Niculescu-Mizil, A. (2006). An Empirical Comparison of Supervised Learning Algorithms. Proceedings of the 23rd International Conference on Machine Learning - ICML ’06, 161–168. https://doi.org/10.1145/1143844.1143865
  7. Chicco, D., & Jurman, G. (2020). The Advantages of The Matthews Correlation Coefficient (MCC) Over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genomics, 21(1), 6. https://doi.org/10.1186/s12864-019-6413-7
  8. Cortes, C., & Vapnik, V. (1995). Support-vector Networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
  9. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., & Fernández-Delgado, A. (2014). Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Machine Learning Research, 15, 3133-3181. http://www.mathworks.es/products/neural-network
  10. Fisher, A., Rudin, C., & Dominici, F. (2019). All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research, 20(177), 1–81
  11. Gorunescu, F. (2011). Classification Performance Evaluation (Vol. 12, pp. 319–330). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-19721-5_6
  12. Haibo He, & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
  13. He, H., Zhang, W., & Zhang, S. (2018). A Novel Ensemble Method for Credit Scoring: Adaption of Different Imbalance Ratios. Expert Systems with Applications, 98, 105–117. https://doi.org/10.1016/j.eswa.2018.01.012
  14. Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A Practical Guide to Support Vector Classification. https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
  15. Jian, C., Gao, J., & Ao, Y. (2016). A New Sampling Method for Classifying Imbalanced Data Based on Support Vector Machine Ensemble. Neurocomputing, 193, 115–122. https://doi.org/10.1016/j.neucom.2016.02.006
  16. Kuhn, M. (2008). Building Predictive Models in R Using the Caret Package. Journal of Statistical Software, 28(5). https://doi.org/10.18637/jss.v028.i05
  17. Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling (Vol. 26). USA: Springer
  18. Kulkarni, V. Y., & Sinha, P. K. (2014). Effective Learning and Classification using Random Forest Algorithm. International Journal of Engineering and Innovative Technology (IJEIT), 3(11), 267–273
  19. Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News, 2(3), 18–22. http://www.stat.berkeley.edu/
  20. Lunardon, N., Menardi, G., & Torelli, N. (2014). ROSE: A Package for Binary Imbalanced Learning. The R Journal, 6(1), 79–89
  21. Noble, W. S. (2006). What is a Support Vector Machine? Nature Biotechnology, 24(12), 1565–1567. https://doi.org/10.1038/nbt1206-1565
  22. Ren, F., Cao, P., Li, W., Zhao, D., & Zaiane, O. (2017). Ensemble Based Adaptive Over-Sampling Method for Imbalanced Data Learning in Computer Aided Detection of Microaneurysm. Computerized Medical Imaging and Graphics, 55, 54–67. https://doi.org/10.1016/j.compmedimag.2016.07.011
  23. Syarip, D. I., & Rosidin. (2003). Education Data and Information Management System within the Directorate General of Islamic Institutions. Directorate General of Islamic Institutions, Ministry of Religious Affairs of the Republic of Indonesia

Last update:

No citation recorded.

Last update: 2025-10-17 02:26:22

No citation recorded.