ViClickbait-2025: A comprehensive dataset for Vietnamese clickbait detection
Dai Phuoc Nguyen, Thien Khai Tran, Y Minh Nguyen, Bay Vo

TL;DR
ViClickbait-2025 is a Vietnamese dataset for identifying clickbait headlines, containing 3414 annotated samples from online news platforms.
Contribution
The dataset introduces a standardized Vietnamese clickbait detection resource with detailed annotations and high inter-annotator agreement.
Findings
31.2% of the headlines in the dataset are labeled as clickbait.
The dataset includes nine attributes such as headline text, metadata, and simulated engagement metrics.
Inter-annotator agreement reached a Cohen’s Kappa of 0.822, indicating strong reliability.
Abstract
ViClickbait-2025 is a curated Vietnamese-language dataset developed to facilitate research on automatic clickbait detection. It comprises 3414 headline samples collected through web scraping from eight major Vietnamese online news platforms between 2023 and 2025. Each headline is annotated as either clickbait or non-clickbait, with 31.2 % labeled as clickbait. The dataset includes nine key attributes, covering headline text, metadata, article summaries, and simulated engagement indicators. A preprocessing pipeline was applied to remove HTML noise, eliminate duplicates, and normalize the data. Annotation was carried out by three independent reviewers using a standardized guideline, with inter-annotator agreement reaching a Cohen’s Kappa of 0.822. Disagreements were resolved by a fourth annotator, and inconclusive cases were excluded. The final dataset spans 13 news categories and is…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Media Influence and Politics · Child Development and Digital Technology
