MULTICLASS CLASSIFICATION OF MARKETPLACE PRODUCTS WITH MACHINE LEARNING

Farhan Satria Aditama; Dewi Krismawati; Setia Pramana

doi:10.14710/medstat.17.1.25-35

DOI: https://doi.org/10.14710/medstat.17.1.25-35

MULTICLASS CLASSIFICATION OF MARKETPLACE PRODUCTS WITH MACHINE LEARNING

*Farhan Satria Aditama - Directorate of Statistical Dissemination, BPS Statistics Indonesia, Jakarta, Indonesia, Indonesia

Dewi Krismawati - Directorate of Analysis and Statistics Development, BPS Statistics Indonesia, Jakarta, Indonesia, Indonesia

Setia Pramana - Politeknik Statistika STIS, Jakarta, Indonesia, Indonesia

Citation Format:

Abstract

The use of marketplace data and machine learning in the collection of commodity data can provide an opportunity for Statistics Indonesia to complete the commodity directories for various surveys. This research adopts machine learning to train a product classification model based on existing datasets to predict whether a new dataset falls into which KBKI category. The dataset contains more than 32,000 products from 26 classes consisting of product data from two biggest marketplaces in Indonesia. Algorithms used for classification include Random Forests (RF), Support Vector Machines (SVM), and Multinomial Naive Bayes (MNB). Results indicate that MNB is the most effective algorithm when considering the trade-off between accuracy and processing time. MNB achieved the highest micro-average F1 scores, with 91.8% for Tokopedia and 95.4% for Shopee, and has the fastest execution time approximately 5 seconds.

Note: This article has supplementary file(s).

Fulltext View|Download | Hasil Riset

Tidak berjudul

Subject
Type	Hasil Riset
	Download (11KB) Indexing metadata

Instrumen Riset

Tidak berjudul

Subject
Type	Instrumen Riset
	Download (75KB) Indexing metadata

Email colleagues

Keywords: Machine Learning, Marketplace, Multiclass Classification.

Article Metrics:

Article Info

Section: Articles

Language : EN

In Vol 17, No 1 (2024): Media Statistika

Most cited articles

DISTRIBUSI RAYLEIGH UNTUK KLAIM AGREGASI MODEL REGRESI COX PROPORSIONAL HAZARD PADA DATA KETAHANAN HIDUP PEMODELAN INFLASI BERDASARKAN HARGA-HARGA PANGAN MENGGUNAKAN SPLINE MULTIVARIABEL Perbandingan Model Estimasi Artificial Neural Network Optimasi Genetic Algorithm dan Regresi Linier Berganda ANALISIS DATA INFLASI DI INDONESIA MENGGUNAKAN MODEL REGRESI SPLINE More cited articles

Ahdiat, A. (2023). 5 E-Commerce dengan Pengunjung Terbanyak di Indonesia Kuartal I 2023. Databoks. https://databoks.katadata.co.id/datapublish/2023/05/03/5-e-commerce-dengan-pengunjung-terbanyak-kuartal-i-2023
Almassar, R. R., & Girsang, A. S. (2022). Detection of traffic congestion based on twitter using convolutional neural network model. IAES International Journal of Artificial Intelligence, 11(4), 1448–1459. https://doi.org/10.11591/ijai.v11.i4.pp1448-1459
Ansari, M. Z., Aziz, M. B., Siddiqui, M. O., Mehra, H., & Singh, K. P. (2020). Analysis of Political Sentiment Orientations on Twitter. Procedia Computer Science, 167, 1821–1828. https://doi.org/10.1016/j.procs.2020.03.201
Arusada, M. D. N., Putri, N. A. S., & Alamsyah, A. (2017). Training data optimization strategy for multiclass text classification. 2017 5th International Conference on Information and Communication Technology, ICoIC7 2017, February. https://doi.org/10.1109/ICoICT.2017.8074652
Awad, M., & Rahul Khanna. (2015). Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers
Badan Pusat Statistik. (2012). Klasifikasi Baku Komoditas Indonesia (KBKI) 2012 (Vol. 1). https://www.bps.go.id/publication/2012/11/30/7c6cba9683c26ffbcdf5561e/klasifikasi-baku-komoditas-indonesia--KBKI--2012-buku-1.html
Badan Pusat Statistik. (2015). Kamus Pembakuan Statistik. https://www.bps.go.id/klasifikasi/app/KBKI
Badan Pusat Statistik. (2020). Kajian Big Data Sebagai Pelengkap Data Dan Informasi Statistik Ekonomi
Baeza-Yates, R., & Liaghat, Z. (2017). Quality-efficiency trade-offs in machine learning for text processing. Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017, 2018-Janua, 897–904. https://doi.org/10.1109/BigData.2017.8258006
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Chau, V.T., & Phung, N. H. (2013). Imbalanced educational data classification: An effective approach with resampling and random forest. The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF), 135–140. https://doi.org/10.1109/RIVF.2013.6719882
Chawla, N. V, K. W. Bowyer, L. O. H., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. 321–357. https://doi.org/10.1613/jair.953
Febriantono, M. A., Pramono, S. H., Rahmadwati, & Naghdy, G. (2020). Classification of multiclass imbalanced data using cost-sensitive decision tree c5.0. IAES International Journal of Artificial Intelligence, 9(1), 65–72. https://doi.org/10.11591/ijai.v9.i1.pp65-72
Ghag, K. V., & K. Shah. (2015). Comparative analysis of effect of stopwords removal on sentiment classification. International Conference on Computer, Communication and Control (IC4). https://doi.org/10.1109/IC4.2015.7375527
Ghozy, M., & Pramana, S. (2020). Kajian Penerapan Data Marketplace dalam Penghitungan Indeks Harga Konsumen. September, 1–15. https://doi.org/10.13140/RG.2.2.17027.73766
Ibnu, M., & Rachmatullah, C. (2022). The Application of Repeated SMOTE for Multi Class Classification on Imbalanced Data How to Cite: M. Rachmatullah, The Application of Repeated SMOTE for Multi Class Classification. Teknik Informatika Dan Rekayasa Komputer, 22(1), 13–24. https://doi.org/10.30812/matrik.v22i1.1803
Idris, A. (2018). Confusion Matrix. Medium.Com. https://medium.com/@awabmohammedomer/confusion-matrix-b504b8f8e1d1
Işik, M., & Dağ, H. (2020). The impact of text preprocessing on the prediction of review ratings. Turkish Journal of Electrical Engineering and Computer Sciences, 28(3), 1405–1421. https://doi.org/10.3906/elk-1907-46
Jianqiang, Z. (2015). "Pre-processing Boosting Twitter Sentiment Analysis? IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), 748–753. https://doi.org/10.1109/SmartCity.2015.158
Khaldy, M. Al. (2018). Resampling Imbalanced Class and the Effectiveness of Feature Selection Methods for Heart Failure Dataset. International Robotics & Automation Journal, 4(1), 37–45. https://doi.org/10.15406/iratj.2018.04.00090
Kuang, Q., & Xu, X. (2010). Improvement and Application of TF-IDF Method Based on Text Classification. International Conference on Internet Technology and Applications, 1–4. https://doi.org/10.1109/ITAPP.2010.5566113
Laksana, J., & Purwarianti, A. (2014). Indonesian Twitter text authority classification for government in Bandung. International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA), 129–134. https://doi.org/10.1109/ICAICTA.2014.7005928
Saleh, S. H., Ismail, R., Ibrahim, Z., & Hussin, N. (2019). Issues, Challenges and Solutions of Big Data in Information Management: An Overview. International Journal of Academic Research in Business and Social Sciences, 8(12). https://doi.org/10.6007/ijarbss/v8-i12/5240
Setiawan, Y., Efendi, R., & Gunawan, D. (2022). Feature Extraction TF-IDF to Perform Cyberbullying Text Classification: A Literature Review and Future Research Direction. International Conference on Information Technology Systems and Innovation (ICITSI), 283–288. https://doi.org/10.1109/ICITSI56531.2022.9970942
Srimulyani, W., Pramana, S., & Bustaman, U. (2021). Developing an online shop sampling frame from big data. Statistical Journal of the IAOS, 38, 1483–1490. https://doi.org/10.3233/SJI-220929
Suyanto. (2019). Data Mining untuk Klasifikasi dan Klasterisasi Data. Penerbit Informatika
Ünal, Y., Sağlam, A., & Kayhan, O. (2019). Improving classification performance for an imbalanced educational dataset example using SMOTE. European Journal of Science and Technology, October, 485–489. https://doi.org/10.31590/ejosat.638608
UNECE. (2022). Machine Learning for Official Statistics. Machine Learning for Official Statistics. https://doi.org/10.18356/9789210011143
Uysal, A. K., & Serkan Gunal. (2014). The impact of preprocessing on text classification. Information Processing & Management, 104–112. https://doi.org/10.1016/j.ipm.2013.08.006
Xu, D., & Wu, S. (2014). An improved TFIDF Algorithm in text classification. Applied Mechanics and Materials, 651–653, 2258–2261. https://doi.org/10.4028/www.scientific.net/AMM.651-653.2258
Yu, W., Sun, Z., Liu, H., Li, Z., & Zheng, Z. (2018). Multi-level deep learning based e-commerce product categorization. CEUR Workshop Proceedings, 2319
Zin, H. M., Mustapha, N., Murad, M. A. A., & Sharef, N. M. (2017). The effects of pre-processing strategies in sentiment analysis of online movie reviews. AIP Conference Proceedings, 1891(October). https://doi.org/10.1063/1.5005422

Last update:

No citation recorded.

Last update: 2026-03-10 18:38:53

No citation recorded.

The Authors submitting a manuscript do so on the understanding that if accepted for publication, copyright of the article shall be assigned to Media Statistika journal and Department of Statistics, Universitas Diponegoro as the publisher of the journal. Copyright encompasses the rights to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms, and any other similar reproductions, as well as translations.

Media Statistika journal and Department of Statistics, Universitas Diponegoro and the Editors make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in Media Statistika journal are the sole and exclusive responsibility of their respective authors and advertisers.

The Copyright Transfer Form can be downloaded here: [Copyright Transfer Form Media Statistika]. The copyright form should be signed originally and send to the Editorial Office in the form of original mail, scanned document or fax :

Dr. Di Asih I Maruddani (Editor-in-Chief)
Editorial Office of Media Statistika
Department of Statistics, Universitas Diponegoro
Jl. Prof. Soedarto, Kampus Undip Tembalang, Semarang, Central Java, Indonesia 50275
Telp./Fax: +62-24-7474754
Email: maruddani@live.undip.ac.id