skip to main content

MULTICLASS CLASSIFICATION OF MARKETPLACE PRODUCTS WITH MACHINE LEARNING

*Farhan Satria Aditama  -  Directorate of Statistical Dissemination, BPS Statistics Indonesia, Jakarta, Indonesia, Indonesia
Dewi Krismawati  -  Directorate of Analysis and Statistics Development, BPS Statistics Indonesia, Jakarta, Indonesia, Indonesia
Setia Pramana  -  Politeknik Statistika STIS, Jakarta, Indonesia, Indonesia
Open Access Copyright (c) 2024 MEDIA STATISTIKA under http://creativecommons.org/licenses/by-nc-sa/4.0.

Citation Format:
Abstract
The use of marketplace data and machine learning in the collection of commodity data can provide an opportunity for Statistics Indonesia to complete the commodity directories for various surveys. This research adopts machine learning to train a product classification model based on existing datasets to predict whether a new dataset falls into which KBKI category. The dataset contains more than 32,000 products from 26 classes consisting of product data from two biggest marketplaces in Indonesia. Algorithms used for classification include Random Forests (RF), Support Vector Machines (SVM), and Multinomial Naive Bayes (MNB). Results indicate that MNB is the most effective algorithm when considering the trade-off between accuracy and processing time. MNB achieved the highest micro-average F1 scores, with 91.8% for Tokopedia and 95.4% for Shopee, and has the fastest execution time approximately 5 seconds.

Note: This article has supplementary file(s).

Fulltext View|Download |  Hasil Riset
Tidak berjudul
Subject
Type Hasil Riset
  Download (11KB)    Indexing metadata
 Instrumen Riset
Tidak berjudul
Subject
Type Instrumen Riset
  Download (75KB)    Indexing metadata
Keywords: Machine Learning, Marketplace, Multiclass Classification.

Article Metrics:

  1. Ahdiat, A. (2023). 5 E-Commerce dengan Pengunjung Terbanyak di Indonesia Kuartal I 2023. Databoks. https://databoks.katadata.co.id/datapublish/2023/05/03/5-e-commerce-dengan-pengunjung-terbanyak-kuartal-i-2023
  2. Almassar, R. R., & Girsang, A. S. (2022). Detection of traffic congestion based on twitter using convolutional neural network model. IAES International Journal of Artificial Intelligence, 11(4), 1448–1459. https://doi.org/10.11591/ijai.v11.i4.pp1448-1459
  3. Ansari, M. Z., Aziz, M. B., Siddiqui, M. O., Mehra, H., & Singh, K. P. (2020). Analysis of Political Sentiment Orientations on Twitter. Procedia Computer Science, 167, 1821–1828. https://doi.org/10.1016/j.procs.2020.03.201
  4. Arusada, M. D. N., Putri, N. A. S., & Alamsyah, A. (2017). Training data optimization strategy for multiclass text classification. 2017 5th International Conference on Information and Communication Technology, ICoIC7 2017, February. https://doi.org/10.1109/ICoICT.2017.8074652
  5. Awad, M., & Rahul Khanna. (2015). Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers
  6. Badan Pusat Statistik. (2012). Klasifikasi Baku Komoditas Indonesia (KBKI) 2012 (Vol. 1). https://www.bps.go.id/publication/2012/11/30/7c6cba9683c26ffbcdf5561e/klasifikasi-baku-komoditas-indonesia--KBKI--2012-buku-1.html
  7. Badan Pusat Statistik. (2015). Kamus Pembakuan Statistik. https://www.bps.go.id/klasifikasi/app/KBKI
  8. Badan Pusat Statistik. (2020). Kajian Big Data Sebagai Pelengkap Data Dan Informasi Statistik Ekonomi
  9. Baeza-Yates, R., & Liaghat, Z. (2017). Quality-efficiency trade-offs in machine learning for text processing. Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017, 2018-Janua, 897–904. https://doi.org/10.1109/BigData.2017.8258006
  10. Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
  11. Chau, V.T., & Phung, N. H. (2013). Imbalanced educational data classification: An effective approach with resampling and random forest. The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF), 135–140. https://doi.org/10.1109/RIVF.2013.6719882
  12. Chawla, N. V, K. W. Bowyer, L. O. H., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. 321–357. https://doi.org/10.1613/jair.953
  13. Febriantono, M. A., Pramono, S. H., Rahmadwati, & Naghdy, G. (2020). Classification of multiclass imbalanced data using cost-sensitive decision tree c5.0. IAES International Journal of Artificial Intelligence, 9(1), 65–72. https://doi.org/10.11591/ijai.v9.i1.pp65-72
  14. Ghag, K. V., & K. Shah. (2015). Comparative analysis of effect of stopwords removal on sentiment classification. International Conference on Computer, Communication and Control (IC4). https://doi.org/10.1109/IC4.2015.7375527
  15. Ghozy, M., & Pramana, S. (2020). Kajian Penerapan Data Marketplace dalam Penghitungan Indeks Harga Konsumen. September, 1–15. https://doi.org/10.13140/RG.2.2.17027.73766
  16. Ibnu, M., & Rachmatullah, C. (2022). The Application of Repeated SMOTE for Multi Class Classification on Imbalanced Data How to Cite: M. Rachmatullah, The Application of Repeated SMOTE for Multi Class Classification. Teknik Informatika Dan Rekayasa Komputer, 22(1), 13–24. https://doi.org/10.30812/matrik.v22i1.1803
  17. Idris, A. (2018). Confusion Matrix. Medium.Com. https://medium.com/@awabmohammedomer/confusion-matrix-b504b8f8e1d1
  18. Işik, M., & Dağ, H. (2020). The impact of text preprocessing on the prediction of review ratings. Turkish Journal of Electrical Engineering and Computer Sciences, 28(3), 1405–1421. https://doi.org/10.3906/elk-1907-46
  19. Jianqiang, Z. (2015). "Pre-processing Boosting Twitter Sentiment Analysis? IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), 748–753. https://doi.org/10.1109/SmartCity.2015.158
  20. Khaldy, M. Al. (2018). Resampling Imbalanced Class and the Effectiveness of Feature Selection Methods for Heart Failure Dataset. International Robotics & Automation Journal, 4(1), 37–45. https://doi.org/10.15406/iratj.2018.04.00090
  21. Kuang, Q., & Xu, X. (2010). Improvement and Application of TF-IDF Method Based on Text Classification. International Conference on Internet Technology and Applications, 1–4. https://doi.org/10.1109/ITAPP.2010.5566113
  22. Laksana, J., & Purwarianti, A. (2014). Indonesian Twitter text authority classification for government in Bandung. International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA), 129–134. https://doi.org/10.1109/ICAICTA.2014.7005928
  23. Saleh, S. H., Ismail, R., Ibrahim, Z., & Hussin, N. (2019). Issues, Challenges and Solutions of Big Data in Information Management: An Overview. International Journal of Academic Research in Business and Social Sciences, 8(12). https://doi.org/10.6007/ijarbss/v8-i12/5240
  24. Setiawan, Y., Efendi, R., & Gunawan, D. (2022). Feature Extraction TF-IDF to Perform Cyberbullying Text Classification: A Literature Review and Future Research Direction. International Conference on Information Technology Systems and Innovation (ICITSI), 283–288. https://doi.org/10.1109/ICITSI56531.2022.9970942
  25. Srimulyani, W., Pramana, S., & Bustaman, U. (2021). Developing an online shop sampling frame from big data. Statistical Journal of the IAOS, 38, 1483–1490. https://doi.org/10.3233/SJI-220929
  26. Suyanto. (2019). Data Mining untuk Klasifikasi dan Klasterisasi Data. Penerbit Informatika
  27. Ünal, Y., Sağlam, A., & Kayhan, O. (2019). Improving classification performance for an imbalanced educational dataset example using SMOTE. European Journal of Science and Technology, October, 485–489. https://doi.org/10.31590/ejosat.638608
  28. UNECE. (2022). Machine Learning for Official Statistics. Machine Learning for Official Statistics. https://doi.org/10.18356/9789210011143
  29. Uysal, A. K., & Serkan Gunal. (2014). The impact of preprocessing on text classification. Information Processing & Management, 104–112. https://doi.org/10.1016/j.ipm.2013.08.006
  30. Xu, D., & Wu, S. (2014). An improved TFIDF Algorithm in text classification. Applied Mechanics and Materials, 651–653, 2258–2261. https://doi.org/10.4028/www.scientific.net/AMM.651-653.2258
  31. Yu, W., Sun, Z., Liu, H., Li, Z., & Zheng, Z. (2018). Multi-level deep learning based e-commerce product categorization. CEUR Workshop Proceedings, 2319
  32. Zin, H. M., Mustapha, N., Murad, M. A. A., & Sharef, N. M. (2017). The effects of pre-processing strategies in sentiment analysis of online movie reviews. AIP Conference Proceedings, 1891(October). https://doi.org/10.1063/1.5005422

Last update:

No citation recorded.

Last update: 2024-12-07 04:43:38

No citation recorded.