skip to main content

Comparative Evaluation of Machine Learning Algorithms with Data Balancing Approach and Hyperparameter Tuning in Predicting Thyroid Disorder Recurrence

Universitas Dian Nuswantoro, Indonesia

Received: 25 Jun 2025; Revised: 1 Oct 2025; Accepted: 10 Nov 2025; Published: 30 Nov 2025.
Open Access Copyright (c) 2025 The authors. Published by Department of Informatics Universitas, Diponegoro
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Abstract
This research evaluates and compares the performance of five machine learning algorithms (Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, and Gradient Boosting) in predicting thyroid disease recurrence using patient data. The analysis was conducted on the Thyroid Disease Dataset from the UCI Machine Learning Repository. The methodology includes data preprocessing, normalization, and class balancing with the Synthetic Minority Over-sampling Technique (SMOTE). Additionally, hyperparameter tuning was conducted using GridSearchCV to optimize model performance. The results demonstrate that ensemble-based models, specifically Random Forest and Gradient Boosting, consistently outperform the other algorithms in terms of accuracy and robustness. These models achieve 95–96% accuracy across various scenarios.A key finding is that SMOTE significantly improves recall for minority classes, highlighting its value in imbalanced medical datasets.
Fulltext View|Download
Keywords: Medical Diagnosis, Thyroid Disease, Machine Learning, SMOTE, Hyperparameter Tuning,

Article Metrics:

  1. H. Risdianti, Y. Sofiani, and T. Muhaemin, “Kualitas Tidur dan Kualitas Hidup Pasien Hiperthyroidisme di Rumah Sakit Bhayangkara Tk. I R. Said Sukanto Kramat Jati Jakarta Timur,” Jurnal Penelitian Kesehatan “SUARA FORIKES” (Journal of Health Research “Forikes Voice”), vol. 12, no. 0, Art. no. 0, Apr. 2021, doi: 10.33846/sf12nk212
  2. R. C. Kusumadewi et al., “Understanding Thyroid Carcinoma: Clinical Manifestations, Diagnosis, and Management,” JBT, vol. 24, no. 4, pp. 645–656, Oct. 2024, doi: 10.29303/jbt.v24i4.7664
  3. N. K. M. Suryantini, L. L. Putri, B. H. Salim, A. Mawaddah, and E. Triani, “Gangguan Hormon Tiroid: Hipotiroidisme,” J.MedHealth, vol. 11, no. 6, pp. 1227–1234, June 2024, doi: 10.33024/jikk.v11i6.14697
  4. S. Park et al., “Dynamic Risk Stratification for Predicting Recurrence in Patients with Differentiated Thyroid Cancer Treated Without Radioactive Iodine Remnant Ablation Therapy,” Thyroid, vol. 27, no. 4, pp. 524–530, Apr. 2017, doi: 10.1089/thy.2016.0477
  5. G. A. Arie, S. D. Santoso, and R. I. Santosa, “Hubungan Gangguan Fungsi Tiroid Terhadap Kadar LDL-Kolestrol,” JSH, vol. 5, no. 2, pp. 6–12, Oct. 2021, doi: 10.51804/jsh.v5i2.1018.6-12
  6. M. A. Arosemena, N. A. Cipriani, and A. M. Dumitrescu, “Graves’ disease and papillary thyroid carcinoma: case report and literature review of a single academic center,” BMC Endocr Disord, vol. 22, no. 1, p. 199, Aug. 2022, doi: 10.1186/s12902-022-01116-1
  7. M. Huang, S. Yang, G. Ge, H. Zhi, and L. Wang, “Effects of Thyroid Dysfunction and the Thyroid-Stimulating Hormone Levels on the Risk of Atrial Fibrillation: A Systematic Review and Dose-Response Meta-Analysis from Cohort Studies,” Endocrine Practice, vol. 28, no. 8, pp. 822–831, Aug. 2022, doi: 10.1016/j.eprac.2022.05.008
  8. S. Zhang et al., “High level of thyroid peroxidase antibodies as a detrimental risk of pregnancy outcomes in euthyroid women undergoing ART: A meta‐analysis,” Molecular Reproduction Devel, vol. 90, no. 4, pp. 218–226, Apr. 2023, doi: 10.1002/mrd.23677
  9. S. L. Andersen and S. Andersen, “Turning to Thyroid Disease in Pregnant Women,” Eur Thyroid J, vol. 9, no. 5, pp. 225–233, 2020, doi: 10.1159/000506228
  10. E. Beka and O. Gimm, “Voice Changes Without Laryngeal Nerve Alterations After Thyroidectomy: The Need For Prospective Trials - A Review Study,” Journal of Voice, vol. 38, no. 1, pp. 231–238, Jan. 2024, doi: 10.1016/j.jvoice.2021.07.012
  11. G. Grani and A. Fumarola, “Thyroglobulin in Lymph Node Fine-Needle Aspiration Washout: A Systematic Review and Meta-analysis of Diagnostic Accuracy,” The Journal of Clinical Endocrinology & Metabolism, vol. 99, no. 6, pp. 1970–1982, June 2014, doi: 10.1210/jc.2014-1098
  12. Y. Zhu, Y. Song, G. Xu, Z. Fan, and W. Ren, “Causes of misdiagnoses by thyroid fine-needle aspiration cytology (FNAC): our experience and a systematic review,” Diagn Pathol, vol. 15, no. 1, p. 1, Dec. 2020, doi: 10.1186/s13000-019-0924-z
  13. K. Y. Na, H.-S. Kim, J.-Y. Sung, W. S. Park, and Y. W. Kim, “Papillary Carcinoma of the Thyroid Gland with Nodular Fasciitis-like Stroma,” Korean J Pathol, vol. 47, no. 2, p. 167, 2013, doi: 10.4132/KoreanJPathol.2013.47.2.167
  14. M. N. Nikiforova et al., “Analytical performance of the ThyroSeq v3 genomic classifier for cancer diagnosis in thyroid nodules,” Cancer, vol. 124, no. 8, pp. 1682–1690, Apr. 2018, doi: 10.1002/cncr.31245
  15. R. G. Wardhana, G. Wang, and F. Sibuea, “Penerapan Machine Learning dalam Prediksi Tingkat Kasus Penyakit di Indonesia,” JOISM, vol. 5, no. 1, pp. 40–45, July 2023, doi: 10.24076/joism.2023v5i1.1136
  16. Ade Ryan Pratama, Farmin Wabula, Haekal Ilmandry, Maria Laura Isabela, Mugi Raharjo, and Ronald Sianipar, “Literature Review The Impact of Machine Learning in Modern Industries,” NianTanaSikka, vol. 3, no. 1, pp. 177–182, Jan. 2025, doi: 10.59603/niantanasikka.v3i1.680
  17. T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information, vol. 14, no. 1, Art. no. 1, Jan. 2023, doi: 10.3390/info14010054
  18. S. Shekhar, A. Bansode, and A. Salim, “A Comparative study of Hyper-Parameter Optimization Tools,” in 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Dec. 2021, pp. 1–6. doi: 10.1109/CSDE53843.2021.9718485
  19. I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN COMPUT. SCI., vol. 2, no. 3, p. 160, May 2021, doi: 10.1007/s42979-021-00592-x
  20. S. Uddin, I. Haque, H. Lu, M. A. Moni, and E. Gide, “Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction,” Sci Rep, vol. 12, no. 1, p. 6256, Apr. 2022, doi: 10.1038/s41598-022-10358-x
  21. T. Kavzoglu and A. Teke, “Predictive Performances of Ensemble Machine Learning Algorithms in Landslide Susceptibility Mapping Using Random Forest, Extreme Gradient Boosting (XGBoost) and Natural Gradient Boosting (NGBoost),” Arab J Sci Eng, vol. 47, no. 6, pp. 7367–7385, June 2022, doi: 10.1007/s13369-022-06560-8
  22. E. Clark, S. Price, T. Lucena, B. Haberlein, A. Wahbeh, and R. Seetan, “Predictive Analytics for Thyroid Cancer Recurrence: A Machine Learning Approach,” Knowledge, vol. 4, no. 4, pp. 557–570, Nov. 2024, doi: 10.3390/knowledge4040029
  23. F. Firat Atay et al., “A hybrid machine learning model combining association rule mining and classification algorithms to predict differentiated thyroid cancer recurrence,” Front. Med., vol. 11, p. 1461372, Oct. 2024, doi: 10.3389/fmed.2024.1461372
  24. R. H. Agarwal, S. Degadwala, and D. Vyas, “Predictive Modeling for Thyroid Disease Diagnosis using Machine Learning,” in 2024 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal: IEEE, Apr. 2024, pp. 227–231. doi: 10.1109/ICICT60155.2024.10544462
  25. A. T. Shiva Borzooei, “Differentiated Thyroid Cancer Recurrence.” UCI Machine Learning Repository, 2023. doi: 10.24432/C5632J
  26. H. M. Marin-Castro and E. Tello-Leal, “Event Log Preprocessing for Process Mining: A Review,” Applied Sciences, vol. 11, no. 22, Art. no. 22, Jan. 2021, doi: 10.3390/app112210556
  27. V. R. Joseph and A. Vakayil, “Split: An Optimal Method for Data Splitting,” Technometrics, vol. 64, no. 2, pp. 166–176, Apr. 2022, doi: 10.1080/00401706.2021.1921037
  28. M. M. Ahsan, M. A. P. Mahmud, P. K. Saha, K. D. Gupta, and Z. Siddique, “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance,” Technologies, vol. 9, no. 3, Art. no. 3, Sept. 2021, doi: 10.3390/technologies9030052
  29. G. A. Pradipta, R. Wardoyo, A. Musdholifah, I. N. H. Sanjaya, and M. Ismail, “SMOTE for Handling Imbalanced Data Problem : A Review,” in 2021 Sixth International Conference on Informatics and Computing (ICIC), Nov. 2021, pp. 1–8. doi: 10.1109/ICIC54025.2021.9632912
  30. P. Cunningham and S. J. Delany, “k-Nearest Neighbour Classifiers - A Tutorial,” ACM Comput. Surv., vol. 54, no. 6, pp. 1–25, July 2022, doi: 10.1145/3459665
  31. Z. Azam, Md. M. Islam, and M. N. Huda, “Comparative Analysis of Intrusion Detection Systems and Machine Learning-Based Model Analysis Through Decision Tree,” IEEE Access, vol. 11, pp. 80348–80391, 2023, doi: 10.1109/ACCESS.2023.3296444
  32. M. Owusu-Adjei, J. Ben Hayfron-Acquah, T. Frimpong, and G. Abdul-Salaam, “Imbalanced class distribution and performance evaluation metrics: A systematic review of prediction accuracy for determining model performance in healthcare systems,” PLOS Digit Health, vol. 2, no. 11, p. e0000290, Nov. 2023, doi: 10.1371/journal.pdig.0000290
  33. P. G. Brindha, R. Boobesh, and S. S. Yokanandh, “Optimization of ML Algorithms in CAD Diagnosis Using Grid Search CV,” in 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Feb. 2025, pp. 2229–2234. doi: 10.1109/IDCIOT64235.2025.10914959

Last update:

No citation recorded.

Last update: 2025-12-03 08:12:30

No citation recorded.