An Ensemble-Based Approach for Detecting Clickbait in Indonesian Online Media

Sandy Kurniawan; Adhe Setya Pramayoga; Yeva Fadhilah Ashari

doi:10.14710/jmasif.16.1.73115

DOI: https://doi.org/10.14710/jmasif.16.1.73115

An Ensemble-Based Approach for Detecting Clickbait in Indonesian Online Media

Sandy Kurniawan

, Adhe Setya Pramayoga

, Yeva Fadhilah Ashari

Universitas Diponegoro, Indonesia

Received: 10 May 2025; Revised: 29 May 2025; Accepted: 30 May 2025; Available online: 30 May 2025; Published: 31 May 2025.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:

Abstract

Clickbait headlines are widely used in online media to attract readers through exaggerated or misleading titles, potentially leading to user dissatisfaction and information overload. This study proposes a machine learning approach for detecting clickbait in Indonesian news headlines using classical classification models and ensemble learning. The dataset consists of labeled clickbait and non-clickbait headlines in Bahasa Indonesia, which were processed and represented using TF-IDF vectorization. Three base classifiers, Multinomial Naive Bayes, Logistic Regression, and Support Vector Machine, were integrated using soft voting and stacking ensemble methods. The experimental results indicate that the stacking ensemble model achieved the highest accuracy of 0.7728, while the voting ensemble recorded the best F1-score of 0.7080, outperforming individual classifiers. Despite these gains, the SVM model demonstrated the most substantial decline in accuracy after stopwords removal, dropping by 0.0410. These findings highlight the effectiveness of ensemble learning in enhancing clickbait detection performance and suggest potential for further optimization in model selection and integration strategies.

Fulltext View|Download

Keywords: Clickbait Detection; Bahasa Indonesia; Online Media; Ensemble Learning; Text Classification

Article Metrics:

Article Info

Section: Research Article

Language : EN

In Vol 16, No 1 (2025): May 2025

Recent articles

Development and Optimization of a Construction Personal Protective Equipment (PPE) Detection Model on YOLOv8 Architecture A Comparative Study of Machine Learning Models for Short-Term Load Forecasting A Comparative Analysis of Convolutional Neural Network (CNN): MobileNetV2 and Xception for Butterfly Species Classification Systematic Literature Review on Medical Image Captioning Using CNN-LSTM and Transformer-Based Models More recent articles

Most cited articles

Pengenalan Jenis Golongan Darah Menggunakan Jaringan Syaraf Tiruan Perceptron KLASIFIKASI UCAPAN KATA DENGAN SUPPORT VECTOR MACHINE Enterprise Architecture Model untuk Aplikasi Government OPTICAL CHARACTER RECOGNITION MENGGUNAKAN ALGORITMA TEMPLATE MATCHING CORRELATION PENYELESAIAN MASALAH JOB SHOP MENGGUNAKAN ALGORITMA GENETIKA More cited articles

X. Li, J. Zhou, H. Xiang, and J. Cao, “Attention Grabbing through Forward Reference: An ERP Study on Clickbait and Top News Stories,” Int J Hum Comput Interact, vol. 40, no. 11, pp. 3014–3029, 2024, doi: 10.1080/10447318.2022.2158262
A.-K. Jung, S. Stieglitz, T. Kissmer, M. Mirbabaie, and T. Kroll, “Click me. . .! The influence of clickbait on user engagement in social media and the role of digital nudging,” PLoS One, vol. 17, no. 6 June, 2022, doi: 10.1371/journal.pone.0266743
L. S. Shiang and S. Wilson, “Unravelling clickbait news as viral journalism in Malaysia: Its phenomenon and impacts,” SEARCH Journal of Media and Communication Research, vol. 16, no. 1, pp. 33–47, 2024
K. Janét, O. Richards, and A. R. Landrum, “Headline Format Influences Evaluation of, but Not Engagement with, Environmental News,” Journalism Practice, vol. 16, no. 1, pp. 35–55, 2022, doi: 10.1080/17512786.2020.1805794
A. William and Y. Sari, “CLICK-ID: A novel dataset for Indonesian clickbait headlines,” Data Brief, vol. 32, p. 106231, Oct. 2020, doi: 10.1016/J.DIB.2020.106231
S. Kurniawan and I. Budi, “Indonesian Tweets Hate Speech Target Classification Using Machine Learning,” in 2020 5th International Conference on Informatics and Computing, ICIC 2020, 2020, pp. 1–5. doi: 10.1109/ICIC50835.2020.9288515
P. Klairith and S. Tanachutiwat, “Thai clickbait detection algorithms using natural language processing with machine learning techniques,” in ICEAST 2018 - 4th International Conference on Engineering, Applied Sciences and Technology: Exploring Innovative Solutions for Smart Society, IEEE, 2018, pp. 1–4. doi: 10.1109/ICEAST.2018.8434447
C. Wu, F. Wu, T. Qi, and Y. Huang, Clickbait Detection with Style-Aware Title Modeling and Co-attention, vol. 12522 LNAI. 2020. doi: 10.1007/978-3-030-63031-7_31
M. Zhou, W. Xu, W. Zhang, and Q. Jiang, “Leverage knowledge graph and GCN for fine-grained-level clickbait detection,” World Wide Web, vol. 25, no. 3, pp. 1243–1258, May 2022, doi: 10.1007/s11280-022-01032-3
K. K. Yadav and N. Bansal, “A Comparative Study on Clickbait Detection using Machine Learning Based Methods,” in 2023 International Conference on Disruptive Technologies, ICDT 2023, 2023, pp. 661–665. doi: 10.1109/ICDT57929.2023.10150475
D. S. Sisodia, “Ensemble learning approach for clickbait detection using article headline features,” Inf Sci, vol. 22, no. 2019, pp. 31–44, 2019, doi: 10.28945/4279
R. Rajesh Sharma, A. Sungheetha, M. A. Haile, A. H. Kedir, A. Rajasekaran, and G. Charles Babu, “Clickbait Detection for Amharic Language Using Deep Learning Techniques,” Journal of Machine and Computing, vol. 4, no. 3, pp. 603–615, 2024, doi: 10.53759/7669/jmc202404058
A. Chowanda, Nadia, and L. M. M. Kolbe, “Identifying clickbait in online news using deep learning,” Bulletin of Electrical Engineering and Informatics, vol. 12, no. 3, pp. 1755–1761, 2023, doi: 10.11591/eei.v12i3.4444
B. U. Nadia and I. A. Iswanto, “Indonesian Clickbait Detection Using Improved Backpropagation Neural Network,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2021, 2021, pp. 252–256. doi: 10.1109/ISRITI54043.2021.9702872
P. Santoso Hadi, Muljono, A. Z. Fanani, G. F. Shidik, Purwanto, and F. Alzami, “Using Extra Weight in Machine Learning Algorithms for Clickbait Detection of Indonesia Online News Headlines,” in Proceedings - 2021 International Seminar on Application for Technology of Information and Communication: IT Opportunities and Creativities for Digital Innovation and Communication within Global Pandemic, iSemantic 2021, 2021, pp. 37–41. doi: 10.1109/iSemantic52711.2021.9573213
C. P. Chai, “Comparison of text preprocessing methods,” Nat Lang Eng, vol. 29, no. 3, pp. 509–553, 2023, doi: 10.1017/S1351324922000213
S. Gan, S. Shao, L. Chen, L. Yu, and L. Jiang, “Adapting hidden naive bayes for text classification,” Mathematics, vol. 9, no. 19, 2021, doi: 10.3390/math9192378
T.-T. Wong and H.-C. Tsai, “Multinomial naïve Bayesian classifier with generalized Dirichlet priors for high-dimensional imbalanced data,” Knowl Based Syst, vol. 228, 2021, doi: 10.1016/j.knosys.2021.107288
X. Zou, Y. Hu, Z. Tian, and K. Shen, “Logistic Regression Model Optimization and Case Analysis,” in Proceedings of IEEE 7th International Conference on Computer Science and Network Technology, ICCSNT 2019, 2019, pp. 135–139. doi: 10.1109/ICCSNT47585.2019.8962457
V. K. Chauhan, K. Dahiya, and A. Sharma, “Problem formulations and solvers in linear SVM: a review,” Artif Intell Rev, vol. 52, no. 2, pp. 803–855, 2019, doi: 10.1007/s10462-018-9614-6
T. Wang, F. Liu, and S. Yan, “Learning class-informed exponential kernel for text categorization,” J Comput Theor Nanosci, vol. 13, no. 8, pp. 5103–5110, 2016, doi: 10.1166/jctn.2016.5389
M. K. Anam, M. B. Firdaus, F. Suandi, Lathifah, T. Nasution, and S. Fadly, “Performance Improvement of Machine Learning Algorithm Using Ensemble Method on Text Mining,” in ICFTSS 2024 - International Conference on Future Technologies for Smart Society, 2024, pp. 90–95. doi: 10.1109/ICFTSS61109.2024.10691363
C. El Morr, M. Jammal, H. Ali-Hassan, and W. El-Hallak, Voting and Bagging, vol. 334. 2022. doi: 10.1007/978-3-031-16990-8_14
A. Chatzimparmpas, R. M. Martins, K. Kucher, and A. Kerren, “StackGenVis: Alignment of data, algorithms, and models for stacking ensemble learning using performance metrics,” IEEE Trans Vis Comput Graph, vol. 27, no. 2, pp. 1547–1557, 2021, doi: 10.1109/TVCG.2020.3030352
P. Mahajan, S. Uddin, F. Hajati, and M. A. Moni, “Ensemble Learning for Disease Prediction: A Review,” Healthcare (Switzerland), vol. 11, no. 12, 2023, doi: 10.3390/healthcare11121808

Last update:

No citation recorded.

Last update: 2025-08-01 08:38:38

No citation recorded.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The authors who submit the manuscript must understand that the article's copyright belongs to the author(s) if accepted for publication. However, the author(s) grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Authors should also understand that their article (and any additional files, including data sets, and analysis/computation data) will become publicly available once published under that license. By submitting the manuscript to Jmasif, the author(s) agree with this policy. No special document approval is required.

The author(s) guarantee that:

their article is original, written by the mentioned author(s),
has never been published before,
does not contain statements that violate the law, and
does not violate the rights of others, is subject to copyright held exclusively by the author(s), is free from the rights of third parties, and the necessary written permission to quote from other sources has been obtained by the author(s).

The author(s) retain all rights to the published work, such as (but not limited to) the following rights:

Copyright and other proprietary rights related to the article, such as patents,
The right to use the substance of the article in its own future works, including lectures and books,
The right to reproduce the article for its own purposes,
The right to archive all versions of the article in any repository, and
The right to enter into separate additional contractual arrangements for the non-exclusive distribution of published versions of the article (for example, posting them to institutional repositories or publishing them in a book), acknowledging its initial publication in this journal (Jurnal Masyarakat Informatika).

Suppose the article was prepared jointly by more than one author. Each author submitting the manuscript warrants that all co-authors have given their permission to agree to copyright and license notices (agreements) on their behalf and notify co-authors of the terms of this policy. Jmasif will not be held responsible for anything arising because of the writer's internal dispute. Jmasif will only communicate with correspondence authors.

Authors should also understand that their articles (and any additional files, including data sets and analysis/computation data) will become publicly available once published. The license of published articles (and additional data) will be governed by a Creative Commons Attribution-ShareAlike 4.0 International License. Jmasif allows users to copy, distribute, display and perform work under license. Users need to attribute the author(s) and Jmasif to distribute works in journals and other publication media. Unless otherwise stated, the author(s) is a public entity as soon as the article is published.