Fake News Data Collection and Classification: Iterative Query Selection for Opaque Search Engines with Pseudo Relevance Feedback
Aviad Elyashar, Maor Reuben, and Rami Puzis

TL;DR
This paper introduces an iterative query selection method to retrieve and classify fake news from opaque search engines, resulting in a large, publicly available dataset that advances fake news detection research.
Contribution
The paper presents a novel iterative query selection algorithm (IQS) that improves search results from black-box engines and enables large-scale fake news data collection.
Findings
IQS outperforms state-of-the-art retrieval methods.
The collected dataset contains 70K news items with 22M accounts.
The dataset improves fake news detection accuracy.
Abstract
Retrieving information from an online search engine, is the first and most important step in many data mining tasks. Most of the search engines currently available on the web, including all social media platforms, are black-boxes (a.k.a opaque) supporting short keyword queries. In these settings, retrieving all posts and comments discussing a particular news item automatically and at large scales is a challenging task. In this paper, we propose a method for generating short keyword queries given a prototype document. The proposed iterative query selection algorithm (IQS) interacts with the opaque search engine to iteratively improve the query. It is evaluated on the Twitter TREC Microblog 2012 and TREC-COVID 2019 datasets showing superior performance compared to state-of-the-art. IQS is applied to automatically collect a large-scale fake news dataset of about 70K true and fake news…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Web Data Mining and Analysis
