skip to main content

BREAST CANCER CLASSIFICATION USING SUPPORT VECTOR MACHINE (SVM) AND LIGHT GRADIENT BOOSTING MACHINE (LIGHTGBM) MODELS

*Puspita Kartikasari orcid scopus  -  Department of Statistics, Universitas Diponegoro, Jl. Prof. Sudarto, SH, Tembalang, Semarang, Indonesia 50275, Indonesia
Iut Tri Utami  -  Department of Statistics, Universitas Diponegoro, Jl. Prof. Sudarto, SH, Tembalang, Semarang, Indonesia 50275, Indonesia
Suparti Suparti  -  Department of Statistics, Universitas Diponegoro, Jl. Prof. Sudarto, SH, Tembalang, Semarang, Indonesia 50275, Indonesia
Syair Dafiq Faizur Rahman  -  Department of Statistics, Universitas Diponegoro, Jl. Prof. Sudarto, SH, Tembalang, Semarang, Indonesia 50275, Indonesia
Open Access Copyright (c) 2023 MEDIA STATISTIKA under http://creativecommons.org/licenses/by-nc-sa/4.0.

Citation Format:
Abstract
This study examines the existence of breast cancer from the perspective of statistics as one alternative solution. From a statistical point of view, breast cancer management can be done with early detection and appropriate and fast treatment measures through diagnosis classification. In conducting early detection, an accurate diagnosis model is needed and can be developed by developing and testing statistical methods, one of which is the classification method. The classification methods used in this study are Support Vector Machine (SVM) and LightGBM. Both methods have a high level of classification accuracy because the algorithm used is robust and sensitive in determining each object in the classification member. Therefore, these two methods classify breast cancer into malignant and benign categories. The results of this study show that the best method to classify breast cancer is the SVM method, with an accuracy rate of 97.9%.
Fulltext View|Download
Keywords: Classification, Breast Cancer, SVM, LightGBM
Funding: LPPM Universitas Diponegoro scheme Riset Pengembangan dan Penerapan (RPP), number of contract 609-16/UN7.D2/PP/VIII/2023.

Article Metrics:

  1. Aditya, A. R., Suparti, S., & Sudarno, S. (2015). Ketepatan Klasifikasi Pemilihan Metode Kontrasepsi Di Kota Semarang Menggunakan Booststrap Aggregatting Regresi Logistik Multinomial. Jurnal Gaussian, 4(1), 11–20. https://doi.org/10.14710/J.GAUSS.4.1.11-20
  2. Bustamam, A., Bachtiar, A., & Sarwinda, D. (2019). Selecting Features Subsets Based on Support Vector Machine-Recursive Features Elimination and One Dimensional-Naïve Bayes Classifier using Support Vector Machines for Classification of Prostate and Breast Cancer. Procedia Computer Science, 157, 450–458. https://doi.org/10.1016/j.procs.2019.08.238
  3. Harbeck, N., Penault-Llorca, F., Cortes, J., Gnant, M., Houssami, N., Poortmans, P., Ruddy, K., Tsang, J., & Cardoso, F. (2019). Breast Cancer. Nature Reviews Disease Primers, 5(1), 66. https://doi.org/10.1038/s41572-019-0111-2
  4. Hanmastiana, I. M., Warsito, B., Rahmawati, R., Yasin, H., & Kartikasari, P. (2022). Classification of Public Opinion on Social Media Twitter concerning the Education in Indonesia Using the K-Nearest Neighbors (K-NN) Algorithm and K-Fold Cross Validation. STATISTIKA Journal of Theoretical Statistics and Its Applications, 21(2), 99–106. https://doi.org/10.29313/statistika.v21i2.297
  5. Hu, Z. D., Zhou, Z. R., & Qian, S. (2015). How to Analyze Tumor Stage Data in Clinical Research. Journal of Thoracic Disease, 7(4), 566–575. https://doi.org/10.3978/j.issn.2072-1439.2015.04.09
  6. Ispriyanti, D. & Hoyyi, A. (2016). Analisis Klasifikasi Masa Studi Mahasiswa Prodi Statistika Undip dengan Metode Support Vector Machine (Svm) Dan Id3 (Iterative Dichotomiser 3). Media Statistika, 9(1), 15–29. https://doi.org/10.14710/medstat.9.1.15-29
  7. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems. https://github.com/Microsoft/LightGBM
  8. Kemenkes RI. (2018). Panduan Penatalaksanaan Kanker Payudara. http://kanker.kemkes.go.id/guidelines/PPKPayudara.pdf
  9. Khandezamin, Z., Naderan, M., & Rashti, M. J. (2020). Detection and Classification of Breast Cancer Using Logistic Regression Feature Selection and GMDH Classifier. Journal of Biomedical Informatics, 111, 103591. https://doi.org/10.1016/j.jbi.2020.103591
  10. Kurniawan, D., Suparti, & Sugito. (2018). Classification Accuracy on the Family Planning Participation Status Using Kernel Discriminant Analysis. Journal of Physics: Conference Series, 1025, 012111. https://doi.org/10.1088/1742-6596/1025/1/012111
  11. Liang, W., Luo, S., Zhao, G., & Wu, H. (2020). Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms. Mathematics, 8(5), 765. https://doi.org/10.3390/math8050765
  12. Marianto, F. Y., Tarno, T., & Maruddani, D. A. I. (2020). Perbandingan Metode Naïve Bayes Dan Bayesian Regularization Neural Network (Brnn) Untuk Klasifikasi Sinyal Palsu Pada Indikator Stochastic Oscillator (Studi Kasus: Saham PT Bank Rakyat Indonesia (Persero) Tbk Periode Januari 2017 – Agustus 2019). Jurnal Gaussian, 9(1), 16–25. https://doi.org/10.14710/J.GAUSS.9.1.16-25
  13. Nugraha, B. W., Widiharih, T., & Kartikasari, P. (2020). Perancangan Multidimensional Scalling Metrik dengan GUI PYTHON 3.8 untuk Klasifikasi Program Keluarga Berencana. Jurnal Statistika Universitas Muhammadiyah Semarang, 8(2), 114. https://doi.org/10.26714/jsunimus.8.2.2020.114-120
  14. Nugroho, A. S., Witartoo, A. B., & Handoko, D. (2003). Application of Support Vector Machine in Bioinformatics. Proceeding of Indonesian Scientific Meeting in Central Japan, December 20, 2023
  15. Pederson, H. J. & Noss, R. (2020). Updates in Hereditary Breast Cancer Genetic Testing and Practical High Risk Breast Management in Gene Carriers. Seminars in Oncology, 47(4), 182–186. https://doi.org/10.1053/j.seminoncol.2020.05.008
  16. Prasetyo, E. (2012). Data Mining: Konsep dan Aplikasi menggunakan MATLAB. Yogyakarta: ANDI
  17. Rufo, D. D., Debelee, T. G., Ibenthal, A., & Negera, W. G. (2021). Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM). Diagnostics, 11(9), 1714. https://doi.org/10.3390/diagnostics11091714
  18. Rutgers, E., Balmana, J., Beishon, M., Benn, K., Evans, D. G., Mansel, R., Pharoah, P., Perry Skinner, V., Stoppa-Lyonnet, D., Travado, L., & Wyld, L. (2019). European Breast Cancer Council manifesto 2018: Genetic Risk Prediction Testing in Breast Cancer. European Journal of Cancer, 106, 45–53. https://doi.org/10.1016/j.ejca.2018.09.019
  19. Saxena, S., & Gyanchandani, M. (2020). Machine Learning Methods for Computer-Aided Breast Cancer Diagnosis Using Histopathology: A Narrative Review. Journal of Medical Imaging and Radiation Sciences, 51(1), 182–193. https://doi.org/10.1016/j.jmir.2019.11.001
  20. Setiawan, F. H., Rahmawati, R., & Suparti, S. (2015). Ketepatan Klasifikasi Keikutsertaan Keluarga Berencana Menggunakan Regresi Logistik Biner dan Regresi Probit Biner (Studi Kasus di Kabupaten Semarang Tahun 2014). Jurnal Gaussian, 4(4), 845–854. https://doi.org/10.14710/J.GAUSS.4.4.845-854
  21. Setiawan, Y. (2023). Data Mining berbasis Nearest Neighbor dan Seleksi Fitur untuk Deteksi Kanker Payudara. Jurnal Informatika: Jurnal Pengembangan IT, 8(2), 89–96. https://doi.org/10.30591/jpit.v8i2.4994
  22. Shi, Y., Dai, D., Liu, C., & Yan, H. (2009). Sparse Discriminant Analysis for Breast Cancer Biomarker Identification and Classification. Progress in Natural Science, 19(11), 1635–1641. https://doi.org/10.1016/j.pnsc.2009.04.013
  23. Siegel, R. L., Miller, K. D., Fuchs, H. E., & Jemal, A. (2022). Cancer Statistics, 2022. CA: A Cancer Journal for Clinicians, 72(1), 7–33. https://doi.org/10.3322/caac.21708
  24. Sokolova, M., & Lapalme, G. (2009). A Systematic Analysis of Performance Measures for Classification Tasks. Information Processing & Management, 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002
  25. Umma, F. N., Warsito, B., & Maruddani, D. A. I. (2021). Klasifikasi Status Kemiskinan Rumah Tangga Dengan Algoritma C5.0 Di Kabupaten Pemalang. Jurnal Gaussian, 10(2), 221–229. https://doi.org/10.14710/j.gauss.v10i2.29934
  26. van der Giessen, J. A. M., van Dulmen, S., Velthuizen, M. E., van den Muijsenbergh, M. E. T. C., van Engelen, K., Collée, M., van Dalen, T., Aalfs, C. M., Hooning, M. J., Spreeuwenberg, P. M. M., Fransen, M. P., & Ausems, M. G. E. M. (2021). Effect of a Health Literacy Training Program for Surgical Oncologists and Specialized Nurses on disparities in referral to Breast Cancer Genetic Testing. The Breast, 58, 80–87. https://doi.org/10.1016/j.breast.2021.04.008
  27. Wardani, N. S., Prahutama, A., & Kartikasari, P. (2020). Analisis Sentimen Pemindahan Ibu Kota Negara dengan Klasifikasi Naïve Bayes untuk Model Bernoulli dan Multinomial. Jurnal Gaussian, 9(3), 237–246. https://doi.org/10.14710/J.GAUSS.9.3.237-246
  28. WHO. (2021). Breast Cancer. https://www.who.int/news-room/fact-sheets/detail/breast-cancer
  29. Widodo, A. M., Anwar, N., Irawan, B., Meria, L., & Wisnujati, A. (2021). Komparasi Perfomansi Algoritma Pengklasifikasi KNN, Bagging dan Random Forest untuk Prediksi Kanker Payudara. KONIK (Konferensi Nasional Ilmu Komputer), 5, 367–372
  30. Woldberg, W., Mangansarian, O., Street, N., & W. Street. (1995). Breast Cancer Winconsin (Diagnostic). UCI Machine learning Repository. https://doi.org/https://doi.org/10.24432/C5DW2B
  31. Wood, A., Shpilrain, V., Najarian, K., & Kahrobaei, D. (2019). Private Naive Bayes Classification of Personal Biomedical Data: Application in Cancer Data Analysis. Computers in Biology and Medicine, 105, 144–150. https://doi.org/10.1016/j.compbiomed.2018.11.018

Last update:

No citation recorded.

Last update: 2024-11-01 20:09:32

No citation recorded.