skip to main content

An Ensemble-Based Approach for Detecting Clickbait in Indonesian Online Media

Universitas Diponegoro, Indonesia

Received: 10 May 2025; Revised: 29 May 2025; Accepted: 30 May 2025; Available online: 30 May 2025; Published: 31 May 2025.
Editor(s): Ferda Ernawan
Open Access Copyright (c) 2025 The authors. Published by Department of Informatics Universitas, Diponegoro
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Abstract
Clickbait headlines are widely used in online media to attract readers through exaggerated or misleading titles, potentially leading to user dissatisfaction and information overload. This study proposes a machine learning approach for detecting clickbait in Indonesian news headlines using classical classification models and ensemble learning. The dataset consists of labeled clickbait and non-clickbait headlines in Bahasa Indonesia, which were processed and represented using TF-IDF vectorization. Three base classifiers, Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine, were integrated using soft voting and stacking ensemble methods. The experimental results indicate that the stacking ensemble model achieved the highest accuracy of 0.7728, while the voting ensemble recorded the best F1-score of 0.7080, outperforming individual classifiers. Despite these gains, the SVM model demonstrated the most substantial decline in accuracy after stopwords removal, dropping by 0.0410. These findings highlight the effectiveness of ensemble learning in enhancing clickbait detection performance and suggest potential for further optimization in model selection and integration strategies.
Fulltext View|Download
Keywords: Clickbait Detection; Bahasa Indonesia; Online Media; Ensemble Learning; Text Classification

Article Metrics:

  1. X. Li, J. Zhou, H. Xiang, and J. Cao, “Attention Grabbing through Forward Reference: An ERP Study on Clickbait and Top News Stories,” Int J Hum Comput Interact, vol. 40, no. 11, pp. 3014–3029, 2024, doi: 10.1080/10447318.2022.2158262
  2. A.-K. Jung, S. Stieglitz, T. Kissmer, M. Mirbabaie, and T. Kroll, “Click me. . .! The influence of clickbait on user engagement in social media and the role of digital nudging,” PLoS One, vol. 17, no. 6 June, 2022, doi: 10.1371/journal.pone.0266743
  3. L. S. Shiang and S. Wilson, “Unravelling clickbait news as viral journalism in Malaysia: Its phenomenon and impacts,” SEARCH Journal of Media and Communication Research, vol. 16, no. 1, pp. 33–47, 2024
  4. K. Janét, O. Richards, and A. R. Landrum, “Headline Format Influences Evaluation of, but Not Engagement with, Environmental News,” Journalism Practice, vol. 16, no. 1, pp. 35–55, 2022, doi: 10.1080/17512786.2020.1805794
  5. A. William and Y. Sari, “CLICK-ID: A novel dataset for Indonesian clickbait headlines,” Data Brief, vol. 32, p. 106231, Oct. 2020, doi: 10.1016/J.DIB.2020.106231
  6. S. Kurniawan and I. Budi, “Indonesian Tweets Hate Speech Target Classification Using Machine Learning,” in 2020 5th International Conference on Informatics and Computing, ICIC 2020, 2020, pp. 1–5. doi: 10.1109/ICIC50835.2020.9288515
  7. P. Klairith and S. Tanachutiwat, “Thai clickbait detection algorithms using natural language processing with machine learning techniques,” in ICEAST 2018 - 4th International Conference on Engineering, Applied Sciences and Technology: Exploring Innovative Solutions for Smart Society, IEEE, 2018, pp. 1–4. doi: 10.1109/ICEAST.2018.8434447
  8. C. Wu, F. Wu, T. Qi, and Y. Huang, Clickbait Detection with Style-Aware Title Modeling and Co-attention, vol. 12522 LNAI. 2020. doi: 10.1007/978-3-030-63031-7_31
  9. M. Zhou, W. Xu, W. Zhang, and Q. Jiang, “Leverage knowledge graph and GCN for fine-grained-level clickbait detection,” World Wide Web, vol. 25, no. 3, pp. 1243–1258, May 2022, doi: 10.1007/s11280-022-01032-3
  10. K. K. Yadav and N. Bansal, “A Comparative Study on Clickbait Detection using Machine Learning Based Methods,” in 2023 International Conference on Disruptive Technologies, ICDT 2023, 2023, pp. 661–665. doi: 10.1109/ICDT57929.2023.10150475
  11. D. S. Sisodia, “Ensemble learning approach for clickbait detection using article headline features,” Inf Sci, vol. 22, no. 2019, pp. 31–44, 2019, doi: 10.28945/4279
  12. R. Rajesh Sharma, A. Sungheetha, M. A. Haile, A. H. Kedir, A. Rajasekaran, and G. Charles Babu, “Clickbait Detection for Amharic Language Using Deep Learning Techniques,” Journal of Machine and Computing, vol. 4, no. 3, pp. 603–615, 2024, doi: 10.53759/7669/jmc202404058
  13. A. Chowanda, Nadia, and L. M. M. Kolbe, “Identifying clickbait in online news using deep learning,” Bulletin of Electrical Engineering and Informatics, vol. 12, no. 3, pp. 1755–1761, 2023, doi: 10.11591/eei.v12i3.4444
  14. B. U. Nadia and I. A. Iswanto, “Indonesian Clickbait Detection Using Improved Backpropagation Neural Network,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2021, 2021, pp. 252–256. doi: 10.1109/ISRITI54043.2021.9702872
  15. P. Santoso Hadi, Muljono, A. Z. Fanani, G. F. Shidik, Purwanto, and F. Alzami, “Using Extra Weight in Machine Learning Algorithms for Clickbait Detection of Indonesia Online News Headlines,” in Proceedings - 2021 International Seminar on Application for Technology of Information and Communication: IT Opportunities and Creativities for Digital Innovation and Communication within Global Pandemic, iSemantic 2021, 2021, pp. 37–41. doi: 10.1109/iSemantic52711.2021.9573213
  16. C. P. Chai, “Comparison of text preprocessing methods,” Nat Lang Eng, vol. 29, no. 3, pp. 509–553, 2023, doi: 10.1017/S1351324922000213
  17. S. Gan, S. Shao, L. Chen, L. Yu, and L. Jiang, “Adapting hidden naive bayes for text classification,” Mathematics, vol. 9, no. 19, 2021, doi: 10.3390/math9192378
  18. T.-T. Wong and H.-C. Tsai, “Multinomial naïve Bayesian classifier with generalized Dirichlet priors for high-dimensional imbalanced data,” Knowl Based Syst, vol. 228, 2021, doi: 10.1016/j.knosys.2021.107288
  19. X. Zou, Y. Hu, Z. Tian, and K. Shen, “Logistic Regression Model Optimization and Case Analysis,” in Proceedings of IEEE 7th International Conference on Computer Science and Network Technology, ICCSNT 2019, 2019, pp. 135–139. doi: 10.1109/ICCSNT47585.2019.8962457
  20. V. K. Chauhan, K. Dahiya, and A. Sharma, “Problem formulations and solvers in linear SVM: a review,” Artif Intell Rev, vol. 52, no. 2, pp. 803–855, 2019, doi: 10.1007/s10462-018-9614-6
  21. T. Wang, F. Liu, and S. Yan, “Learning class-informed exponential kernel for text categorization,” J Comput Theor Nanosci, vol. 13, no. 8, pp. 5103–5110, 2016, doi: 10.1166/jctn.2016.5389
  22. M. K. Anam, M. B. Firdaus, F. Suandi, Lathifah, T. Nasution, and S. Fadly, “Performance Improvement of Machine Learning Algorithm Using Ensemble Method on Text Mining,” in ICFTSS 2024 - International Conference on Future Technologies for Smart Society, 2024, pp. 90–95. doi: 10.1109/ICFTSS61109.2024.10691363
  23. C. El Morr, M. Jammal, H. Ali-Hassan, and W. El-Hallak, Voting and Bagging, vol. 334. 2022. doi: 10.1007/978-3-031-16990-8_14
  24. A. Chatzimparmpas, R. M. Martins, K. Kucher, and A. Kerren, “StackGenVis: Alignment of data, algorithms, and models for stacking ensemble learning using performance metrics,” IEEE Trans Vis Comput Graph, vol. 27, no. 2, pp. 1547–1557, 2021, doi: 10.1109/TVCG.2020.3030352
  25. P. Mahajan, S. Uddin, F. Hajati, and M. A. Moni, “Ensemble Learning for Disease Prediction: A Review,” Healthcare (Switzerland), vol. 11, no. 12, 2023, doi: 10.3390/healthcare11121808

Last update:

No citation recorded.

Last update: 2025-06-01 23:05:43

No citation recorded.