Differences in sensationalism in international news media reporting of COVID-19: An exploratory analysis using the Global Public Health Intelligence Network (GPHIN) system
Joanna Przepiorkowski, Tenzin Norzin, Abdelhamid Zaghlool, Florence Tanguay, Dorcas Taylor, Victor Gallant, Linlu Zhao

TL;DR
This study explores how international news media in different languages reported on the early stages of the COVID-19 pandemic, finding that some languages used more sensationalized and negative language.
Contribution
The study introduces a mixed-methods analysis of sensationalism in multilingual news media reporting during the early phase of the COVID-19 pandemic.
Findings
155 out of 951 eligible news articles contained sensationalism.
Significant differences in sensationalism were found among French, Russian, and Spanish articles.
News articles with sensationalism had a more negative emotional tone.
Abstract
The Global Public Health Intelligence Network (GPHIN) is an event-based surveillance platform that collects thousands of pieces of open-source information, including international news media, across multiple languages on a daily basis. Analysts have observed that news media reporting in some languages tended to use more sensational wording to describe major health events. There has been minimal research exploring potential differences in sensationalism in international news media reporting to confirm these observations. This exploratory study assessed the differences in the level of sensationalism in early international news media reporting of COVID-19 through a mixed-methods analysis. Relevant news media articles received in GPHIN seven days following the Public Health Emergency of International Concern declaration of COVID-19 by the World Health Organization were extracted for…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2| Date | GPHIN system query | Search results |
|---|---|---|
| Date of PHEIC declaration by the WHO: January 30, 2020 | (Title/Text contains (exact match): coronavirus OR Title/Text contains (exact match): corona virus OR Title/Text contains (exact match): 2019-nCoV OR Title/Text contains (exact match): Wuhan pneumonia) AND (Title/Text contains (exact match): PHEIC OR Title/Text contains (exact match): public health emergency of international concern OR Title/Text contains all of the following (comma separated): international, emergency) AND Date received between 2020-01-30 and 2020-02-06a | 951 |
| Domain | Question |
|---|---|
| Exposing | Does the article attempt to expose certain events? |
| Speculating | Does the article offer a guess or suggest what the future consequences of an issue are likely to be? |
| Generalizing | Does the article make generalizing statements that extrapolate a trend out of an incident or pass a judgement about a whole class of people? |
| Warning | Does the article generate anxiety about an issue or offer suggestions on how to avoid becoming a victim? |
| Extolling | Does the article exaggerate facts as extraordinary, project events as historic, praise individuals for heroic acts, etc.? |
| Method | Summary |
|---|---|
| AFINN | AFINN is a lexicon-based sentiment analysis method. It assigns pre-defined sentiment scores (positive or negative) to individual words in a document. |
| Bing | Bing is another lexicon-based sentiment analysis method. It assigns positive, negative or neutral labels to individual words. |
| Syuzheta | Syuzhet is a sentiment analysis package in R that relies on sentiment lexicons and dictionaries. |
| National Research Council Canada (NRC) | The NRC |
| Domain | Themes identified |
|---|---|
| Exposing | Criticism of China |
| Speculating | Unknown/grim outcome, unknown consequences |
| Generalizing | Discrimination of Chinese individuals and their “cause” of COVID-19 |
| Warning | Threat of COVID-19 spreading to other countries/worldwide |
| Extolling | Labelling COVID-19 as a monster/evil/demon/invisible killer |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts
Introduction
On January 7, 2020, Chinese authorities identified a novel coronavirus temporarily named “2019-nCoV” ((1)), which rapidly spread around the world. The World Health Organization (WHO) declared a public health emergency of international concern (PHEIC) on January 30, 2020. Due to the global spread of the virus, which came to be known as COVID-19, WHO characterized the outbreak as a pandemic on March 11, 2020 ((1)).
This pandemic was the first public health event of its kind with constant media coverage ((2)), becoming a political battleground, with leaders debating over public policy and medical interpretations ((2)). The pandemic also highlighted multiple social, cultural and economic issues arising from the media’s constant dissemination of information ((3)). Verified, official information, based on best information available at the time, was complicated by inaccurate claims amplified on various news media platforms, which proved to be almost as much of a threat to global public health as the virus itself ((4)).
Technological advancements and online news media create opportunities to keep people informed, connected and safe ((4)). However, it can also create the opportunity for sensationalizing issues by presenting news as more extraordinary, interesting or relevant than is objectively warranted, which can undermine global responses and jeopardize measures to control major public health events ((4)).
During global disasters such as pandemics, crisis communication is crucial to dispel fears and uncertainties and unify individuals worldwide against public health threats ((5)). Sensational communication, however, can result in negative personal and economic consequences ((5)). For new or emerging diseases, particularly when there is limited available information from official sources, sensational reporting may influence the risk assessment and response to the event implemented by decision-makers, as well as the perception of the risk of the event by the public ((6,7)).
The Global Public Health Intelligence Network (GPHIN) is an all-hazards event-based surveillance system that is operated by the Public Health Agency of Canada (PHAC) ((8)). The system was developed by the Government of Canada in collaboration with the WHO for use by non-governmental agencies and organizations, as well as government authorities who conduct public health surveillance ((9)). To identify potential public health threats, GPHIN collects and assesses thousands of pieces of open-source information on a daily basis through artificial intelligence algorithms (i.e., machine learning, natural language processing). Although GPHIN monitors a diverse array of open sources, most of the information is currently sourced from news media (i.e., news media in the context of this study refers to mass media that focus on delivering news in text format via the Internet to the public) ((8)). The information is then curated by a multicultural team of analysts, covering 10 languages (Arabic, Chinese [Simplified], Chinese [Traditional], English, Farsi, French, Hindi, Portuguese, Russian and Spanish).
Given their linguistic and cultural diversity, GPHIN analysts add a language and cultural perspective to the interpretation of international reporting that may otherwise be misinterpreted or misunderstood if only machine-translated English or only a Canadian cultural lens is used. Over time, GPHIN analysts have observed that news media from various languages tended to use more exaggerated or hyperbolical expressions/terms to describe new or emerging diseases.
There has been minimal research exploring potential differences in sensationalism in international news media reporting of major health events to confirm these observations and presents a knowledge/research gap. To address this gap, this exploratory study assessed the differences in the level of sensationalism in early international news media reporting of COVID-19 through a mixed-methods analysis.
Methods
Relevant news media articles received in the GPHIN system in the first seven days following the declaration of the COVID-19 PHEIC on January 30, 2020, were identified and extracted (Table 1). The time restriction of the first seven days was chosen to observe the initial reaction of news media following the COVID-19 PHEIC declaration, so there is a shared baseline for the globally relevant health event. Articles in Arabic, Chinese (Simplified and Traditional), English, Farsi, French, Portuguese, Russian and Spanish were reviewed (the Hindi language was not yet implemented into the GPHIN system at the time of the study). Reports from non-news media sources, including official sources such as the WHO, European Centre for Disease Prevention and Control, and United States Centers for Disease Control and Prevention were excluded.
An adapted version a tool by Hoffman et al. ((10)) was piloted to measure the sensationalism of pandemic-related health news was used to assess five domains of sensationalism as described in Table 2. For this study, a binary “Yes/No” response was used for the tool, instead of the five-point Likert-like scale, where “Yes” represented the presence of sensational text and “No” represented the absence of sensational text. This modification was made to avoid the potential subjectivity of assessing the relative degree of sensationalism using the Likert-like scale (where the differences between “not too much,” “somewhat” and “fairly” sensationalizing are open to interpretation). An article was deemed to have overall sensationalism if at least one of the domains listed in Table 2 was selected as “Yes.”
An inclusion/exclusion assessment was performed using the criteria in Table 1 by two reviewers, with any disagreements resolved by consensus. The following data was extracted for each article included for analysis: assessment against each of the five domains of sensationalism, overall assessment of sensationalism (Sensationalism=Yes if at least one domain selected and Sensationalism=No if no domains were selected), date of publication, country/territory of news media outlet and original language of publication. Data extraction was performed by one reviewer and validated by a second reviewer.
The title and body of each news media article included for analysis were independently appraised for sensationalism by two reviewers, with any disagreements resolved by consensus. For non-English articles, English analysts reviewed the machine-translated text in English, while GPHIN analysts with expertise in the language of the article performed a secondary review in the original language.
This study used a mixed-method approach for analysis. For the qualitative portion, thematic analysis, a flexible method that enables the identification of patterns of meaning (themes) across data sets by interrogating both semantic and latent meanings (i.e., content, ideas, assumptions) below the surface ((11)), was used. In this study, deductive thematic analysis was used as themes were identified within each domain. There were 155 news media articles identified as having overall sensationalism and top themes within each domain were recorded.
For the quantitative portion of the study, the analysis of articles with overall sensationalism (Sensationalism=Yes) was performed using Stata IC 15.1. Differences in the prevalence of sensationalism in news media reporting by language were assessed using the chi-square test and Fisher’s exact test, depending on whether assumptions were met. Four sentiment analysis methods, AFINN, Bing, Syuzhet and National Research Council packages, were performed to assess the sentiment and emotional tone of news media articles ((12)). These sentiment analyses were done using algorithms implemented in R programming language (see Table 3). Using the Welch two sample t-test function in R, the sentiments were then compared between news media articles with (“Yes”) and without (“No”) overall sensationalism to determine whether there were statistical differences in the sentiment and tone of describing and reporting on COVID-19. The text analyzed in this analysis was strictly in English or machine-translated English due to R package restrictions.
Results
Screening
From the GPHIN system, 951 articles were screened and assessed for eligibility. Of these, 449 were excluded as they did not meet the eligibility criteria. There were 200 English and 302 non-English articles included in the analysis (Figure 1). Out of 502 articles, 155 were identified as having overall sensationalism. Sensationalism and news media country/territory of publication was not found to be statistically significant (**Appendix, **Table A1) and, therefore, we could not explore the potential differences between reporting in countries and assess whether it was a potential confounder.
Screening of news media related to COVID-19 public health event of international concern on the Global Public Health Intelligence Network system
Qualitative analysis
Common themes observed within news media articles that had overall sensationalism are presented in Table 4.
With each sensationalism domain, five statistically significant themes were observed.
Exposing domain: News media in the French language exposed that local healthcare systems were becoming overwhelmed and saturated with patients ((13)). There was also negative criticism of how China was handling the COVID-19 situation and how they tried to maintain an image to the global community, however the “social pressure was too much” ((14)).
Speculating domain: The French language news media had speculated about what would happen to the local business and economy if COVID-19 spread and shut down countries ((15)). Articles had also speculated about the true cause of COVID-19 and from where it came ((16)). They also questioned whether isolation measures ever worked ((17)). Similarly, news media in the Russian language speculated about whether COVID-19 would do harm to the economy ((18)).
Generalizing domain: Discriminatory undertones were found within new media in the French language. Articles stated that the cause of the situation was due to Chinese citizens and that it was Wuhan’s problem rather than a problem for the rest of the world ((14,19–21)).
Warning domain: For the articles with the elements of the warning domain, the focus was broad. Articles in the Spanish language warned about the situation of COVID-19 spreading to other countries and that COVID-19 was spreading much faster than the previous SARS outbreak in 2001 ((22–25)). Articles further warned readers that the situation in China was completely out of hand and that the issue of COVID-19 was extremely serious ((26)). There was also a notion that the virus was unstoppable, emphasizing the urgency and anxiety of the situation. A theme of warning, telling readers that they were dealing with a dangerous enemy, was also noted ((27–29)).
Extolling domain: The Russian language was the only language that was statistically significant by the extolling domain. Articles mentioned the “fight against evil” and that “a monster is born” for which “the world is not ready” ((30,31)).
Quantitative analysis
Sensationalism domains and language: English and Arabic were identified as having the highest number of articles with overall sensationalism (n=62 and n=24, respectively), while Farsi, Russian and Chinese were found to have the lowest number of overall sensationalism (n=1, n=7 and n=7, respectively) (Figure 2). The French language was statistically significant for exposing (p=0.004), speculating (p=0.007) and generalizing (p=0.007). The Russian language was statistically significant for speculating (p=0.046), generalizing (p=0.046), warning (p=0.013), extolling (p=0.046) and overall sensationalism (p=0.004). The Spanish language was statistically significant for warning (p=0.034) and overall sensationalism (p=0.034).
News media articles reviewed by language with and without overall sensationalism
Sentiment analysis: To determine whether differences in sentiment were statistically significant for the two groups (Overall Sensationalism=Yes vs. Overall Sensational=No), t-tests were performed using R programming.
· For the AFINN score comparison, t(240)=−3.8309, p<0.001
· For the Bing score comparison, t(235)=−4.7292, p<0.001
· For the Syuzhet score comparison, t(217)=−2.962, p<0.001
The p-value obtained from the tests suggested statistical significance in sentiment scores between the two groups. A more negative mean AFINN, Bing and Syuzhet score for the overall sensationalism news article group indicated a difference in the overall sentiment or emotional tone between these groups.
Regarding the National Research Council Canada’s (NRC) score for sentiments, the comparison was made to negative, positive and fear sentiments. The scores in the Overall Sensationalism=Yes news group were significantly higher in all three using the t-test to compare.
· The results for NRC-negative comparison: t(239)=5.483, df=239.67, p<0.001
· The results for NRC-positive comparison: t(247)=4.5944, p<0.001
· The results for NRC-fear comparison: t(254)=5.4729, p<0.001
The NRC sentiment scores for negative, positive and fear sentiments were significantly higher in the Overall Sensationalism=Yes news article group. This aligns with the expectation that sensationalism exaggerates emotions, including negative and fearful sentiments. The higher positive scores may be due to sensationalized content trying to elicit strong emotional reactions from readers.
Based on the provided results and analysis, there was a significant difference in sentiment and emotional tone between articles with sensationalism compared with those without sensationalism.
Discussion
This study has demonstrated that even with machine translation, sensational language can still be understood, and this may have an influence on a reader’s perception of a given issue. Themes repeated in news media articles, regardless of language, may also impact and change the reader’s perceptions. Sensationalism in news media reporting could have impacted how COVID-19 was perceived after the PHEIC declaration. The analysis of sentiment scores indicated a clear and statistically significant difference in sentiment and emotional tone between sensational and non-sensational articles.
As seen in our study, media may use elements from warning, extolling, speculating and/or exaggerating domains of sensationalism to capture the reader’s attention and sway the reader in a particular direction. This could be damaging to a reader as, oftentimes, when reading a news article, they may not be able to do anything to prevent or reduce the issue’s risk, which could increase perceived vulnerability, creating anxiety ((32)). This idea complements further evidence suggesting that the more access one has to information, the more stressful one may become, potentially inducing unnecessary fear and concern ((33)).
Our study did not look at the effects of the use of sensationalism in social media; however, our findings on traditional news media complement research on both social media and traditional news media and the potential misperception of information regarding COVID-19. A study by Montezari et al. highlighted that exposure to COVID-19 news on social media was significantly correlated with increased feelings of anxiety and fear, as well as behavioural changes ((32)). Ravenelle et al. observed that increased media consumption was linked to decreased mental health and a sense of unhealthiness ((34)). Other research studies have found that even if one has a highly curated social media feed, there is still a possibility for media to contain misconstrued messages and information ((35,36)). Dechene et al. noted that with increasing cumulative exposure to exaggerated information, users are more likely to experience a “reinforcement effect,” where familiarity leads to a stronger change and belief of opinion ((37)).
Limitations
A limitation of this exploratory study was that the articles included for review were restricted to those picked up through the GPHIN system and therefore may not be representative of all available news media articles available online. Misclassification bias could have occurred when reviewing articles for sensationalism. We note that the analysis performed was only looking at the seven days following the PHEIC; therefore, we were unable to conduct a trend analysis over time to see if there was a change in the language used in the reporting of COVID-19 by the media. Although efforts were made to minimize translation and cultural bias by including language specialists in the sensationalism assessments, there was likely residual language/cultural bias due to the study team living in Canada and working in English/French. The validity of the pilot tool used in this exploratory study to assess sensationalism has not yet been established.
Conclusion/future research
Having ready access to international news media reporting allows individuals to be informed and connected; however, as demonstrated in this study, news media sources may be prone to sensationalism and should be interpreted with caution. The findings of this exploratory study suggest that it would be beneficial for tools to be developed that can help analysts and users of event-based surveillance systems flag potentially sensational articles so that the information could be appropriately assessed and used to inform decision-making. Such tools do not currently exist, although there are similar tools (e.g., websites such as mediabiasfactcheck.com) that provide information on political standpoints and the trustworthiness of news media sources. Additional research is needed to refine and validate tools for assessing sensationalism in news media and other media, as well as to examine the topic across different health events and time periods to identify broader trends.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1World Health Organization. Coronavirus disease (COVID-19). Geneva, CH: WHO. https://www.who.int/europe/health-topics/coronavirus
- 2Nelson T, Kagan N, Critchlow C, Hillard A, Hsu A. The Danger of Misinformation in the COVID-19 Crisis. Mo Med 2020;117(6):510–2.33311767 PMC 7721433 · pubmed ↗
- 3Anwar A, Malik M, Raees V, Anwar A. Role of Mass Media and Public Health Communications in the COVID-19 Pandemic. Cureus 2020;12(9):e 10453. 10.7759/cureus.1045333072461 PMC 7557800 · doi ↗ · pubmed ↗
- 4Volkmer I. Social media and COVID-19: A global study of digital crisis interaction among Gen Z and millennials. Melbourne, AU: University of Melbourne; 2021. 10.46580/124367 · doi ↗
- 5Su Z, Mc Donnell D, Wen J, Kozak M, Abbas J, Šegalo S, Li X, Ahmad J, Cheshmehzangi A, Cai Y, Yang L, Xiang YT. Mental health consequences of COVID-19 media coverage: the need for effective crisis communication practices. Global Health 2021 Jan;17(1):4. 10.1186/s 12992-020-00654-433402169 PMC 7784222 · doi ↗ · pubmed ↗
- 6Pratama AR, Firmansyah FM. COVID-19 mass media coverage in English and public reactions: a West-East comparison via Facebook posts. Peer J Comput Sci 2022;8:e 1111. 10.7717/peerj-cs.111136262131 PMC 9575862 · doi ↗ · pubmed ↗
- 7Ottwell R, Puckett M, Rogers T, Nicks S, Vassar M. Sensational media reporting is common when describing COVID-19 therapies, detection methods, and vaccines. J Investig Med 2021;69(6):1256–7. 10.1136/jim-2020-00176034021053 · doi ↗ · pubmed ↗
- 8Public Health Agency of Canada. About GPHIN. Ottawa, ON: PHAC. https://gphin.canada.ca/cepr/aboutgphin-rmispenbref.jsp
