Quantifying correlations between information overload and fake news during COVID-19 pandemic: a Reddit study with BERT model approach
Jan Rawa, Julian Sienkiewicz

TL;DR
This study explores the relationship between information overload and fake news during COVID-19 on Reddit, using a Gini index from BERTopic and FakeBERT, revealing significant global correlations but ambiguous community-level results.
Contribution
It introduces a novel proxy for information overload using the Gini index from BERTopic and applies it to large social media datasets to analyze fake news correlations.
Findings
Significant global correlation between Gini index and fake news fraction.
Gini index can serve as a proxy for information overload in large datasets.
Community-level correlations are ambiguous and require further investigation.
Abstract
Information overload (IOL) is a well-known and devastating phenomenon that alters the performance of carrying out all types of tasks. It has been shown that in the media space, IOL can contribute to news fatigue and news avoidance, which often leads to the proliferation of fake news posts on social networks. However, there is a lack of automatic methods that can be used to track IOL in large datasets. In this study, we investigate whether the Gini index calculated from the distribution of topics obtained via the BERTopic model can be considered a proxy for IOL. We test our assumptions on a set of Reddit communities related to the COVID-19 pandemic and obtain a significant global correlation between the Gini index and the fraction of fake news detected by the FakeBERT classifier. However, at the community level, the correlation analysis results are ambiguous.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
