COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and   Content Recommendations

Vanessa Su; Nirmalya Thakur

arXiv:2412.17180·cs.SI·December 24, 2024

COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations

Vanessa Su, Nirmalya Thakur

PDF

Open Access

TL;DR

This paper analyzes COVID-19 content on YouTube using NLP techniques to assess sentiment, toxicity, and themes, and develops a recommendation system that promotes relevant and responsible pandemic-related videos.

Contribution

It introduces a comprehensive framework combining sentiment, toxicity, and topic analysis with a tailored recommendation system for COVID-19 YouTube content.

Findings

01

49.32% positive, 36.63% neutral, 14.05% negative sentiment

02

0.91% of content identified as toxic

03

69% coverage with five recommendations

Abstract

This study presents a data-driven analysis of COVID-19 discourse on YouTube, examining the sentiment, toxicity, and thematic patterns of video content published between January 2023 and October 2024. The analysis involved applying advanced natural language processing (NLP) techniques: sentiment analysis with VADER, toxicity detection with Detoxify, and topic modeling using Latent Dirichlet Allocation (LDA). The sentiment analysis revealed that 49.32% of video descriptions were positive, 36.63% were neutral, and 14.05% were negative, indicating a generally informative and supportive tone in pandemic-related content. Toxicity analysis identified only 0.91% of content as toxic, suggesting minimal exposure to toxic content. Topic modeling revealed two main themes, with 66.74% of the videos covering general health information and pandemic-related impacts and 33.26% focused on news and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Sentiment Analysis and Opinion Mining