COMPARISON OF RANDOM FOREST AND SUPPORT VECTOR MACHINE CLASSIFICATION METHODS FOR PREDICTING THE ACCURACY LEVEL OF MADRASAH DATA

Dodi Irawan Syarip; Khairil Anwar Notodiputro; Bagus Sartono

doi:10.14710/medstat.18.1.37-48

DOI: https://doi.org/10.14710/medstat.18.1.37-48

COMPARISON OF RANDOM FOREST AND SUPPORT VECTOR MACHINE CLASSIFICATION METHODS FOR PREDICTING THE ACCURACY LEVEL OF MADRASAH DATA

*Dodi Irawan Syarip

- Department of Statistics, IPB University, Kampus IPB, Jalan Meranti Wing 22 Level 4, Dramaga, Babakan, Kec. Dramaga, Kabupaten Bogor, Jawa Barat 16680, Indonesia

Khairil Anwar Notodiputro

- Department of Statistics, IPB University, Kampus IPB, Jalan Meranti Wing 22 Level 4, Dramaga, Babakan, Kec. Dramaga, Kabupaten Bogor, Jawa Barat 16680, Indonesia

Bagus Sartono

- Department of Statistics, IPB University, Kampus IPB, Jalan Meranti Wing 22 Level 4, Dramaga, Babakan, Kec. Dramaga, Kabupaten Bogor, Jawa Barat 16680, Indonesia

Citation Format:

Abstract

This study aims to identify the most effective classification method for predicting the accuracy level of madrasah data with class imbalance. Two machine learning approaches were employed: Random Forest (RF) and Support Vector Machine (SVM). Based on the AUC values, it was concluded that the RF model had a slightly better performance in predicting the accuracy level of the madrasah data, with an average AUC of 62.82, compared to the SVM model, which had an average AUC of 62.33. Among all models, the highest and consistent performance was achieved by the RF model using ROSE techniques. The results of measuring variable importance showed that the predictor variables with the greatest influence in predicting the accuracy level of the madrasah data are the number of students and the student-to-teacher and staff ratio. This finding suggests that school principals and madrasah administrative staff should prioritize ensuring the completeness of student, teacher, and staff data to improve the overall reliability of madrasah data.

Note: This article has supplementary file(s).

Fulltext View|Download | Data Set

Dataset of the Result of the 2023 Islamic Education Data Accuracy Audit Survey

Subject
Type	Data Set
	Download (71KB) Indexing metadata

Keywords: Accuracy; AUC; Random Forest; ROSE; SVM

Article Metrics:

Article Info

Section: Articles

Language : EN

In Vol 18, No 1 (2025): Media Statistika

Recent articles

Front Matter Vol. 18 No. 1 2025 COMPARISON OF RANDOM FOREST AND SUPPORT VECTOR MACHINE CLASSIFICATION METHODS FOR PREDICTING THE ACCURACY LEVEL OF MADRASAH DATA RANDOM EFFECTS META-REGRESSION USING WEIGHTED LEAST SQUARES (CASE STUDY: EFFECTIVENESS OF ACCEPTANCE AND COMMITMENT THERAPY IN REDUCING DEPRESSION) COMPARATIVE EVALUATION OF ARIMA AND GRU MODELS IN PREDICTING RUPIAH DOLLAR EXCHANGE RATE More recent articles

Most cited articles

ESTIMASI PARAMETER DISTRIBUSI WEIBULL DUA PARAMETER MENGGUNAKAN METODE BAYES Perbandingan Model Estimasi Artificial Neural Network Optimasi Genetic Algorithm dan Regresi Linier Berganda ANALISIS DATA INFLASI DI INDONESIA PASCA KENAIKAN TDL DAN BBM TAHUN 2013 MENGGUNAKAN MODEL REGRESI KERNEL ANALISIS PENGARUH KARAKTERISTIK WILAYAH (KELURAHAN) TERHADAP BANYAKNYA KASUS DEMAM BERDARAH DENGUE (DBD) DI KOTA SEMARANG APLIKASI GENERALIZED SPACE TIME AUTOREGRESSIVE (GSTAR) PADA PEMODELAN VOLUME KENDARAAN MASUK TOL SEMARANG More cited articles

Awad, M., & Khanna, R. (2015). Support Vector Regression. In Efficient Learning Machines, pp. 67–80. Apress. https://doi.org/10.1007/978-1-4302-5990-9_4
Babbie, E. R. (2020). The Practice of Social Research (15th ed.). USA: Cengage Learning
Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for Data Quality Assessment and Improvement. ACM Computing Surveys, 41(3), 1–52. https://doi.org/10.1145/1541880.1541883
Breiman, L. (2001). Random Forests. Machine Learning, 45(5), 5–32. https://doi.org/https://doi.org/10.1023/A:1010933404324
Breiman, L., & Cutler, A. (2003). Manual on Setting Up, Using, and Understanding Random Forests V4.0. https://www.Stat.Berkeley.Edu/~breiman/Using_random_forests_v4.0.Pdf
Caruana, R., & Niculescu-Mizil, A. (2006). An Empirical Comparison of Supervised Learning Algorithms. Proceedings of the 23rd International Conference on Machine Learning - ICML ’06, 161–168. https://doi.org/10.1145/1143844.1143865
Chicco, D., & Jurman, G. (2020). The Advantages of The Matthews Correlation Coefficient (MCC) Over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genomics, 21(1), 6. https://doi.org/10.1186/s12864-019-6413-7
Cortes, C., & Vapnik, V. (1995). Support-vector Networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., & Fernández-Delgado, A. (2014). Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Machine Learning Research, 15, 3133-3181. http://www.mathworks.es/products/neural-network
Fisher, A., Rudin, C., & Dominici, F. (2019). All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research, 20(177), 1–81
Gorunescu, F. (2011). Classification Performance Evaluation (Vol. 12, pp. 319–330). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-19721-5_6
Haibo He, & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284. https://doi.org/10.1109/TKDE.2008.239
He, H., Zhang, W., & Zhang, S. (2018). A Novel Ensemble Method for Credit Scoring: Adaption of Different Imbalance Ratios. Expert Systems with Applications, 98, 105–117. https://doi.org/10.1016/j.eswa.2018.01.012
Hsu, C.-W., Chang, C.-C., & Lin, C.-J. (2003). A Practical Guide to Support Vector Classification. https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
Jian, C., Gao, J., & Ao, Y. (2016). A New Sampling Method for Classifying Imbalanced Data Based on Support Vector Machine Ensemble. Neurocomputing, 193, 115–122. https://doi.org/10.1016/j.neucom.2016.02.006
Kuhn, M. (2008). Building Predictive Models in R Using the Caret Package. Journal of Statistical Software, 28(5). https://doi.org/10.18637/jss.v028.i05
Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling (Vol. 26). USA: Springer
Kulkarni, V. Y., & Sinha, P. K. (2014). Effective Learning and Classification using Random Forest Algorithm. International Journal of Engineering and Innovative Technology (IJEIT), 3(11), 267–273
Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News, 2(3), 18–22. http://www.stat.berkeley.edu/
Lunardon, N., Menardi, G., & Torelli, N. (2014). ROSE: A Package for Binary Imbalanced Learning. The R Journal, 6(1), 79–89
Noble, W. S. (2006). What is a Support Vector Machine? Nature Biotechnology, 24(12), 1565–1567. https://doi.org/10.1038/nbt1206-1565
Ren, F., Cao, P., Li, W., Zhao, D., & Zaiane, O. (2017). Ensemble Based Adaptive Over-Sampling Method for Imbalanced Data Learning in Computer Aided Detection of Microaneurysm. Computerized Medical Imaging and Graphics, 55, 54–67. https://doi.org/10.1016/j.compmedimag.2016.07.011
Syarip, D. I., & Rosidin. (2003). Education Data and Information Management System within the Directorate General of Islamic Institutions. Directorate General of Islamic Institutions, Ministry of Religious Affairs of the Republic of Indonesia

Last update:

No citation recorded.

Last update: 2026-01-08 03:46:39

No citation recorded.

The Authors submitting a manuscript do so on the understanding that if accepted for publication, copyright of the article shall be assigned to Media Statistika journal and Department of Statistics, Universitas Diponegoro as the publisher of the journal. Copyright encompasses the rights to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.

Media Statistika journal and Department of Statistics, Universitas Diponegoro and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Media Statistika journal are the sole and exclusive responsibility of their respective authors and advertisers.

The Copyright Transfer Form can be downloaded here: [Copyright Transfer Form Media Statistika]. The copyright form should be signed originally and send to the Editorial Office in the form of original mail, scanned document or fax :

Dr. Di Asih I Maruddani (Editor-in-Chief)
Editorial Office of Media Statistika
Department of Statistics, Universitas Diponegoro
Jl. Prof. Soedarto, Kampus Undip Tembalang, Semarang, Central Java, Indonesia 50275
Telp./Fax: +62-24-7474754
Email: maruddani@live.undip.ac.id