Machine Learning Based Detection of Clickbait Posts in Social Media
Xinyue Cao, Thai Le, Jason (Jiasheng) Zhang

TL;DR
This paper develops a machine learning model using selected features to effectively detect clickbait headlines on social media, improving user experience by filtering misleading content.
Contribution
It introduces a feature selection process and applies Random Forest Regression to achieve high accuracy in clickbait detection on a large annotated dataset.
Findings
Achieved 82% accuracy in detecting clickbait.
Selected 60 most important features from 331.
Demonstrated effectiveness of Random Forest Regression.
Abstract
Clickbait (headlines) make use of misleading titles that hide critical information from or exaggerate the content on the landing target pages to entice clicks. As clickbaits often use eye-catching wording to attract viewers, target contents are often of low quality. Clickbaits are especially widespread on social media such as Twitter, adversely impacting user experience by causing immense dissatisfaction. Hence, it has become increasingly important to put forward a widely applicable approach to identify and detect clickbaits. In this paper, we make use of a dataset from the clickbait challenge 2017 (clickbait-challenge.com) comprising of over 21,000 headlines/titles, each of which is annotated by at least five judgments from crowdsourcing on how clickbait it is. We attempt to build an effective computational clickbait detection model on this dataset. We first considered a total of 331…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Child Development and Digital Technology · Expert finding and Q&A systems
