Click it or Leave it: Detecting and Spoiling Clickbait with Informativeness Measures and Large Language Models

Wojciech Michaluk; Tymoteusz Urban; Mateusz Kubita; Soveatin Kuntur; Anna Wroblewska

arXiv:2602.18171·cs.CL·February 23, 2026

Click it or Leave it: Detecting and Spoiling Clickbait with Informativeness Measures and Large Language Models

Wojciech Michaluk, Tymoteusz Urban, Mateusz Kubita, Soveatin Kuntur, Anna Wroblewska

PDF

Open Access

TL;DR

This paper introduces a hybrid method combining transformer embeddings and linguistic features to detect clickbait headlines, achieving high accuracy and interpretability, and providing tools for reproducible research.

Contribution

It presents a novel hybrid approach using large language models and linguistic features for effective and interpretable clickbait detection, outperforming existing baselines.

Findings

01

Best model achieves 91% F1-score

02

Linguistic cues improve interpretability

03

Hybrid approach outperforms traditional methods

Abstract

Clickbait headlines degrade the quality of online information and undermine user trust. We present a hybrid approach to clickbait detection that combines transformer-based text embeddings with linguistically motivated informativeness features. Using natural language processing techniques, we evaluate classical vectorizers, word embedding baselines, and large language model embeddings paired with tree-based classifiers. Our best-performing model, XGBoost over embeddings augmented with 15 explicit features, achieves an F1-score of 91\%, outperforming TF-IDF, Word2Vec, GloVe, LLM prompt based classification, and feature-only baselines. The proposed feature set enhances interpretability by highlighting salient linguistic cues such as second-person pronouns, superlatives, numerals, and attention-oriented punctuation, enabling transparent and well-calibrated clickbait predictions. We release…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Health Literacy and Information Accessibility · Text Readability and Simplification