*Farhan Satria Aditama  -  Directorate of Statistical Dissemination, BPS Statistics Indonesia, Jakarta, Indonesia, Indonesia
Dewi Krismawati  -  Directorate of Analysis and Statistics Development, BPS Statistics Indonesia, Jakarta, Indonesia, Indonesia
Setia Pramana  -  Politeknik Statistika STIS, Jakarta, Indonesia, Indonesia
The use of marketplace data and machine learning in the collection of commodity data can provide an opportunity for Statistics Indonesia to complete the commodity directories for various surveys. This research adopts machine learning to train a product classification model based on existing datasets to predict whether a new dataset falls into which KBKI category. The dataset contains more than 32,000 products from 26 classes consisting of product data from two biggest marketplaces in Indonesia. Algorithms used for classification include Random Forests (RF), Support Vector Machines (SVM), and Multinomial Naive Bayes (MNB). Results indicate that MNB is the most effective algorithm when considering the trade-off between accuracy and processing time. MNB achieved the highest micro-average F1 scores, with 91.8% for Tokopedia and 95.4% for Shopee, and has the fastest execution time approximately 5 seconds.

Keywords: Machine Learning, Marketplace, Multiclass Classification.

