What Makes You CLIC: Detection of Croatian Clickbait Headlines
Marija An{\dj}eli\'c, Dominik \v{S}ipek, Laura Majer, Jan \v{S}najder

TL;DR
This paper introduces CLIC, a Croatian clickbait dataset, and compares fine-tuned BERT and large language models for clickbait detection, finding fine-tuned models outperform LLMs.
Contribution
The paper provides the first Croatian clickbait dataset and evaluates the effectiveness of fine-tuned models versus in-context learning methods.
Findings
Nearly 50% of headlines are clickbait.
Fine-tuned models outperform LLMs in detection accuracy.
Analysis of linguistic features of Croatian clickbait.
Abstract
Online news outlets operate predominantly on an advertising-based revenue model, compelling journalists to create headlines that are often scandalous, intriguing, and provocative -- commonly referred to as clickbait. Automatic detection of clickbait headlines is essential for preserving information quality and reader trust in digital media and requires both contextual understanding and world knowledge. For this task, particularly in less-resourced languages, it remains unclear whether fine-tuned methods or in-context learning (ICL) yield better results. In this paper, we compile CLIC, a novel dataset for clickbait detection of Croatian news headlines spanning a 20-year period and encompassing mainstream and fringe outlets. We fine-tune the BERTi\'c model on this task and compare its performance to LLM-based ICL methods with prompts both in Croatian and English. Finally, we analyze the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Misinformation and Its Impacts · Radio, Podcasts, and Digital Media
