COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations
Vanessa Su, Nirmalya Thakur

TL;DR
This paper analyzes COVID-19 content on YouTube using NLP techniques to assess sentiment, toxicity, and themes, and develops a recommendation system that promotes relevant and responsible pandemic-related videos.
Contribution
It introduces a comprehensive framework combining sentiment, toxicity, and topic analysis with a tailored recommendation system for COVID-19 YouTube content.
Findings
49.32% positive, 36.63% neutral, 14.05% negative sentiment
0.91% of content identified as toxic
69% coverage with five recommendations
Abstract
This study presents a data-driven analysis of COVID-19 discourse on YouTube, examining the sentiment, toxicity, and thematic patterns of video content published between January 2023 and October 2024. The analysis involved applying advanced natural language processing (NLP) techniques: sentiment analysis with VADER, toxicity detection with Detoxify, and topic modeling using Latent Dirichlet Allocation (LDA). The sentiment analysis revealed that 49.32% of video descriptions were positive, 36.63% were neutral, and 14.05% were negative, indicating a generally informative and supportive tone in pandemic-related content. Toxicity analysis identified only 0.91% of content as toxic, suggesting minimal exposure to toxic content. Topic modeling revealed two main themes, with 66.74% of the videos covering general health information and pandemic-related impacts and 33.26% focused on news and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Sentiment Analysis and Opinion Mining
