Written and spoken corpus of real and fake social media postings about   COVID-19

Ng Bee Chin; Ng Zhi Ee Nicole; Kyla Kwan; Lee Yong Han Dylann; Liu; Fang; Xu Hong

arXiv:2310.04237·cs.CL·October 9, 2023

Written and spoken corpus of real and fake social media postings about COVID-19

Ng Bee Chin, Ng Zhi Ee Nicole, Kyla Kwan, Lee Yong Han Dylann, Liu, Fang, Xu Hong

PDF

Open Access

TL;DR

This paper analyzes linguistic differences between real and fake COVID-19 news in social media posts and videos, revealing patterns that distinguish truthful content from misinformation in both text and speech data.

Contribution

It introduces a combined corpus of COVID-19 related tweets and TikTok videos, applying LIWC analysis to identify linguistic features that differentiate real from fake news.

Findings

01

Identified linguistic markers distinguishing fake from real news

02

Demonstrated differences in language use across text and speech data

03

Provided insights into language patterns influencing misinformation spread

Abstract

This study investigates the linguistic traits of fake news and real news. There are two parts to this study: text data and speech data. The text data for this study consisted of 6420 COVID-19 related tweets re-filtered from Patwa et al. (2021). After cleaning, the dataset contained 3049 tweets, with 2161 labeled as 'real' and 888 as 'fake'. The speech data for this study was collected from TikTok, focusing on COVID-19 related videos. Research assistants fact-checked each video's content using credible sources and labeled them as 'Real', 'Fake', or 'Questionable', resulting in a dataset of 91 real entries and 109 fake entries from 200 TikTok videos with a total word count of 53,710 words. The data was analysed using the Linguistic Inquiry and Word Count (LIWC) software to detect patterns in linguistic data. The results indicate a set of linguistic features that distinguish fake news from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Digital Communication and Language · Hate Speech and Cyberbullying Detection