Sentiment analysis of cancer screening in Chinese social media: Qualitative studies based on machine learning
Qi Zhou, Lingling Qian, Luyu Wu, Haiqian Wu, Junwei Ye, Qinrou Yu, Xiangnan Gu, Yueli Zhu

TL;DR
This study explores public sentiments about cancer screening in China using social media data and machine learning to understand perceptions and inform better health communication strategies.
Contribution
The novel contribution is the use of sentiment analysis on Chinese social media to identify emotional responses and barriers to cancer screening.
Findings
Seven distinct emotional categories were identified in public discussions about cancer screening.
Negative emotions like fear and stigma were found to hinder screening participation.
Findings suggest strategies to improve nursing communication and public screening engagement.
Abstract
Explore public perceptions and sentiments about cancer screening on social media. The dissemination of misinformation and negative attitudes continue to impede the access of many individuals with perceived risk to cancer screening services despite their awareness of the necessity and concept of early cancer screening. This study was divided into five steps: data collection, data cleaning, data standardization, sentiment analysis, and content analysis. This study analyzed 796 social media comments (53,151 words) from Weibo, Zhihu, and Xiaohongshu to explore public sentiments toward cancer screening. Seven emotion categories emerged: good, happy, surprise, anger, disgust, fear, and sadness. Positive emotions reflected trust in physicians, financial support, and perceived screening effectiveness, whereas negative emotions reflected fear of cancer, stigma, and procrastination. The…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig 1
Fig 2
Fig 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Social Media in Health Education · Sentiment Analysis and Opinion Mining
1. Introduction
Cancer is a leading cause of morbidity and mortality worldwide, with lung, colorectal, and breast cancers among the most common [1]. Prevention and early detection are essential to reducing incidence and mortality [2,3]. Cancer screening enhances cure rates, decreases treatment costs, preserves quality of life, and promotes health awareness and healthy lifestyles [2].
The dissemination of misinformation and negative attitudes continue to impede the access of many individuals with perceived risk to cancer screening services despite their awareness of the necessity and concept of early cancer screening [2]. Specifically, individuals often fail to comply with screening recommendations due to their lack of Knowledge, personal health beliefs, low self-efficacy, or lack of drive [4–6]. The test acceptance rates for cervical, colorectal, and breast cancers are declining [3]. Consequently, we must take the necessary measures to increase the acceptance of screening [2,7,8].
Social media increases the acceptance rates of cancer screening [9–11]. For caregivers, social media allows nurses to use their influence as opinion leaders to promote health education and cancer screening [12]. For the public, Social media provides a plethora of information about the importance of cancer screening, the risks and benefits, and various screening methods [13]. Many cancer patients and previous screening participants will use social media to discuss their firsthand experiences and feelings. It helps others comprehend the screening procedure and results and alleviates their fears. However, the researchers neglected comments representing user perceptions and sentiments on cancer screening [14,15].
Public attitudes heavily influence the decision to use cancer screening services [16–18]. Attitudes reflect the public’s perception of health hazards and preventive measures [19,20]. The participation and acceptability of cancer screening may be enhanced when the general public recognizes the benefits or is influenced by others. Unfortunately, the research on sentiment analysis for public cancer screening is limited. It focuses on only a few types of cancer and lacks in-depth sentiment analysis [21]. The study is primarily focused on particular types, such as breast and colon cancer, and various treatment techniques [22]. There is an urgent need for comprehensive cancer screening services that can evaluate individual screening needs, optimize resource allocation, and increase public acceptance and participation, highlighting the importance of this issue to the reader. In addition, most studies are limited to acquiring positive or negative results, with a need for further analysis of the causes and attitudes [16,23]. Consequently, this situation hampers the applicability of the findings to health education. While prior studies examined public sentiment toward specific cancer types (e.g., breast, colorectal), few have systematically analyzed emotions surrounding cancer screening more broadly in the Chinese social media context. Cultural, linguistic, and healthcare system differences may shape public perceptions uniquely, highlighting the need for this study.
Certain disadvantages exist when a researcher utilizes traditional methods, including questionnaires and interviews, to acquire the public’s affective attitudes [2]. Interviews make it difficult for the researcher to obtain honest opinions because the researcher fears offending if negative attitudes are expressed. Questionnaires can be costly, and the survey’s specific questions limit the range of responses [22]. Therefore, analyses based on social media comments may overcome the limitations of traditional survey methods and give researchers a fresh perspective for future studies. The potential of social media for improving healthcare research is a reason for optimism and hope in the field.
According to the literature, cancer screening research has primarily been based on the English-language Internet, such as Twitter and YouTube [22,23]. Thus, researchers and medical professionals must focus on research studies regarding Internet cancer screening in China. The large number of cancer patients in China and disparities in culture, health policies, and healthcare systems can influence the implementation, promotion, and acceptance of cancer screening [15,20].
In conclusion, this study addresses these gaps by analyzing social media comments across three major Chinese platforms, applying both sentiment and content analysis. We extend prior literature by capturing a wider range of emotions, identifying their implications for health communication, and emphasizing how nurses can respond to the public’s emotional needs.
2. Method
2.1. Data collection
Comments were collected from Zhihu (https://www.zhihu.com/), Xiaohongshu (https://www.xiaohongshu.com), and Weibo (https://www.weibo.com) from 2012 to August 10, 2024, using keywords such as “screening,” “mass screening,” “cancer screening,” and “secondary prevention.” All available comments matching these criteria during the timeframe were included, resulting in 796 comments. According to King’s recommendations [10], the user type includes non-healthcare, health professionals, media organizations, and unknown. Non-healthcare includes Self, family/friends, celebrities, and well-known. This step is implemented using Python’s Selenium library. Those steps are shown in Fig 1
Algorithm flow.
2.2. Data cleaning
Chinese text preprocessing was performed using the jieba segmentation library. Stop words were removed using a Chinese stop-word dictionary, and synonyms were unified to improve consistency. This step is implemented using Python’s Numpy library. 1. Remove noisy data: Clean up non-text elements like HTML tags, emoticons, and URLs to reduce the irrelevant content. 2. Standardisation: Converting standard forms, such as symbols, spaces, and abbreviations, without changing the meaning of the text. 3. Spell Checker: The spell checker helps to quickly identify spelling mistakes in the text, increasing the speed and accuracy of the algorithm. 4. Remove unused words: Remove common words unrelated to sentiment analysis, such as ‘of,’ ‘is, “in,’ etc.
2.3. Data standardization
Data standardization improves the accuracy of sentiment analysis and content analysis. This step is implemented using the TfidfVectorizer of Python’s Sklearn library. 1. Segmentation: Simplifying critical components of a sentence improves the algorithm’s understanding of the sentence. 2. Feature extraction: textual data is transformed into numerical feature vectors. 3. Constructing the matrix: The TF-IDF document-word matrix represents the text as the number of word occurrences. Each matrix element represents the TF-IDF weight of the word in the text. 4. Normalisation: ensures that all text has the same scale, eliminating the effect of text length.
2.4. Sentiment analysis
- Select a lexicon: We used Dalian University of Technology’s emotion dictionary, which contains word lexical categories, emotion categories, emotion intensity, and polarity. Using Ekman’s emotion classification approach, the lexicon has been updated with Chinese adaptations [24–26]. 2. Lexical matching: The cleaned text matches the sentiment lexicon. Second, the emotion words in the text are identified. Finally, the words are labeled positive, negative, or neutral emotions. 3. Score each token: Using weighting and counting methods to score the emotion on the labeled emotion words. 4. Verification results: This study used a two-person manual labeling comparison to ensure the accuracy of sentiment analysis. Two independent coders manually labeled a random subset of 100 comments. Inter-rater reliability was high (Cohen’s Kappa = 0.82), ensuring consistency.
2.5. Content analysis
We used the K-Means Algorithm for content analysis to obtain comprehensive and insightful results based on the sentiment analysis results. K-Means Algorithm is an unsupervised machine-learning method that explores potential patterns or themes in textual data [27]. Previous research by our research team has validated and optimized this algorithm in 4 steps. 1. Initialization: randomly select K clustering centers. 2. Iterative optimization: The distance from each data point to all cluster centers is calculated and assigned to the cluster with the closest distance. For each cluster, calculate the mean of all data points within the cluster and use the mean as the new cluster center. 3. Termination Criteria: The K clustering centers no longer changed significantly. The K value was determined using the elbow method, testing values from 3 to 10. A value of k = 7 was chosen as optimal, aligning with the seven emotion categories identified.
2.6. Ethics approval
Ethics approval was obtained from the Institutional Review Board of the Integrated Traditional and Western Medicine Hospital of Linping District (Approval No. 2024_09). All comments were anonymized, with usernames and identifiers removed. The study complied with Chinese data protection regulations. Data collection and analysis complied with the terms and conditions of each platform, including Zhihu, Xiaohongshu, and Weibo.
3. Results
This study included 796 comments, totaling 53,151 words. The results of the temporal analysis show that the increase in the level of discussion has been faster in the last three years, especially in 2024, when it is as high as 59.5 percent; May-July has the highest percentage (15.3%, 14.6%, 23.8%); and 9–15 clock has the highest percentage (6.6%, 6.6%, 6.0%, 5.6%, 7.6%, 8.3%, 6.3%). The results of user type showed that non-healthcare users comprised 68.6%, health professional(22.5%), media organization(6.5%), and unknown(2.4%).
The sentiment analysis results were classified into seven categories: good, happy, surprise, anger, disgust, fear, and sad, as shown in Table 1.
Table 1: The results of content analysis and sentiment analysis.
Positive emotions are shown in Fig 2. The results of positive emotions are divided into four categories. Words like health and persistence are linked to positive psychological drive. Words like Specialist and hope are linked to trust. Words like compliance and importance are linked to financial support. Words like accurate and compelling are linked to screening effectiveness. The results of happy emotions are divided into two categories. Words like success and celebration are linked to resilience. Words like reassurance and enhancement are linked to safety. The results of surprise emotions showed that Words like strange and magical are linked to self-doubt.
Positive emotions.
Negative emotions are shown in Fig 3. The results of anger emotions showed that Words like outbursts and humiliations are linked to impaired self-esteem. The results of disgust emotions are divided into two categories. Words like helpless and invasive are linked to stigma and embarrassment. Words like worried and severe are linked to perceived self-vulnerability. The results of fear emotions are divided into two categories. Words like terrifying and fatal are linked to a lack of Knowledge. Words like fear and nightmare are linked to risk perception. The results of sad emotions are divided into four categories. Words like regrets and remember are linked to procrastination and low perceived susceptibility. Words like pain, loss, and homelessness are linked to regret or anticipated death. Words like nothingness and frustration are linked to loss. Words like loneliness and trauma are linked to loneliness and trauma.
Negative emotions.
4. Discussion
This study revealed seven emotional categories related to cancer screening. Positive emotions highlight opportunities to build trust and resilience, while negative emotions reveal barriers such as fear, stigma, and procrastination. Condensing overlapping explanations avoids redundancy and emphasizes key contrasts between emotions.
Screeners experiencing positive emotions often described a sense of social obligation and responsibility, aligning with literature that emphasizes health behaviors as a form of civic participation [28–30]. Screening not only enhanced their personal well-being but also provided social recognition. Trust was another critical dimension: confidence in physicians and screening technologies directly reinforced belief in the process and encouraged participation [31–34]. In addition, participants emphasized the role of policy and financial support in reducing economic barriers, a finding consistent with prior research [35,36]. Lastly, the perceived effectiveness of screening—its accuracy and personalization—was a strong motivator, reducing misdiagnosis fears and fostering security [37–39]. Collectively, these aspects suggest that healthcare systems should emphasize social value, transparent communication, and technical quality, while governments should expand subsidies and funding to guarantee equitable access.
Happy emotion was expressed through resilience and safety. For some, screening represented an opportunity for early detection and reassurance of health, enabling them to confront illness with confidence [40,41]. Others emphasized a release of health concerns and confirmation of their preventive measures, leading to psychological well-being and greater engagement in healthy behaviors [33,42,43]. These findings suggest that professional counseling and consistent feedback from healthcare providers can reinforce resilience and provide reassurance, thereby sustaining long-term screening participation [43].
Unexpected screening results produced feelings of self-doubt and cognitive dissonance. Individuals who previously considered themselves healthy struggled to reconcile their self-image with the new information, which sometimes extended to skepticism toward the healthcare system [43,44]. Such experiences illustrate the fragile balance between information disclosure and patient confidence. While surprising results may catalyze reflection and behavior change, they may also provoke denial or avoidance. Addressing this requires sensitive communication strategies that normalize unexpected outcomes while guiding individuals toward constructive coping [45].
Anger emotions. Anger often stemmed from negative screening experiences or frustration at perceived loss of autonomy. Screeners who resisted acknowledging health problems sometimes redirected their anger toward healthcare providers or institutions [33,40,42]. If unmanaged, such emotions can erode trust in the system and discourage follow-up. Interventions should therefore focus on transparent communication, patient autonomy, and timely explanations of results. By fostering mutual respect and clear dialogue, healthcare professionals can transform anger into opportunities for improved patient engagement [46].
Disgust emotion was primarily linked to stigma, embarrassment, and loss of bodily autonomy during invasive procedures. Feelings of vulnerability intensified when individuals perceived a lack of privacy or control over their health [33,42,47]. These reactions highlight the importance of developing less invasive, more comfortable technologies and reducing the psychological burden of screening [48,49]. Additionally, education on the medical necessity of certain procedures, coupled with empathy-based care, can mitigate stigma and embarrassment, encouraging more open participation.
Fear emotions as a particularly complex and ambivalent response. On one hand, perceived risk motivated some to seek reassurance through screening; on the other, fear of diagnosis or treatment deterred participation [42,50,51]. Moreover, fear often extended beyond personal health to concerns about family responsibilities, stability, and social roles. Such dual effects confirm prior findings that fear can both mobilize and paralyze health behaviors [51–53]. Addressing this requires nuanced strategies: campaigns that clarify risks, decision-support tools that reduce uncertainty, and psychological therapies (e.g., cognitive behavioral approaches) to help individuals regulate fear without avoidance [54].
Sad emotion was the most multifaceted negative emotion, encompassing procrastination, regret, loss, anticipated death, and loneliness. Many screeners expressed grief for delaying screening, missing opportunities for early treatment, and fearing the worsening of illness [50,51,55]. Others described helplessness, mourning, and isolation, especially in the absence of social support [33,51,53]. Such findings emphasize the need for comprehensive psychosocial care that integrates emotional support, family involvement, and palliative planning when necessary. Healthcare professionals should actively encourage early screening, share success stories, and ensure continuity of care by coordinating with specialized services and support networks [23,44,54].
There are four limitations to this study. First, the Internet’s health professionals and media organizations represent a small part and may not reflect their attitudes. Therefore, we recommend that future studies use medical forum data and news media to supplement them. Second, social media platforms may amplify extreme emotions through algorithmic recommendations, which could bias the observed distribution. Moreover, our sample overrepresents younger users, while older adults are underrepresented. Third, emotions such as trust and fear were associated with reported screening attitudes, but causality cannot be established from observational data. Fourth, the Non-healthcare group has a large younger demographic but lacks an elderly demographic. Older persons are the main target population for cancer screening, but social media overlook their voices. Future studies should focus on older persons’ attitudes about cancer screening.
5. Conclusion
This study identified seven categories of emotions in Chinese social media discourse on cancer screening. Positive emotions can be leveraged to strengthen trust and participation, while negative emotions highlight psychological and social barriers. These insights provide guidance for designing emotionally responsive and culturally tailored nursing interventions. Specifically, nurses and health communicators should address fear and stigma through targeted education and counseling while reinforcing positive psychological drivers to promote early screening participation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1International Agency for Research on Cancer. Cancer fact sheets: All cancers. https://gco.iarc.fr/today/en. 2019. Accessed 2024 August 21.
- 2Chan DNS, So WKW. Effectiveness of motivational interviewing in enhancing cancer screening uptake amongst average-risk individuals: A systematic review. Int J Nurs Stud. 2021;113:103786. doi: 10.1016/j.ijnurstu.2020.103786 33091749 · doi ↗ · pubmed ↗
- 3Smith RA, Andrews KS, Brooks D, Fedewa SA, Manassaram-Baptiste D, Saslow D, et al. Cancer screening in the United States, 2019: A review of current American Cancer Society guidelines and current issues in cancer screening. CA Cancer J Clin. 2019;69(3):184–210. doi: 10.3322/caac.21557 30875085 · doi ↗ · pubmed ↗
- 4Chan DNS, So WKW. A Systematic Review of the Factors Influencing Ethnic Minority Women’s Cervical Cancer Screening Behavior: From Intrapersonal to Policy Level. Cancer Nurs. 2017;40(6):E 1–30. doi: 10.1097/NCC.0000000000000436 28081032 · doi ↗ · pubmed ↗
- 5So WKW, Wong CL, Chow KM, Chen JMT, Lam WWT, Chan CWH, et al. The uptake of cervical cancer screening among South Asians and the general population in Hong Kong: A comparative study. Journal of Cancer Policy. 2017;12:90–6. doi: 10.1016/j.jcpo.2017.03.015 · doi ↗
- 6Hamashima C, Saito H, Sobue T. Awareness of and adherence to cancer screening guidelines among health professionals in Japan. Cancer Sci. 2007;98(8):1241–7. doi: 10.1111/j.1349-7006.2007.00512.x 17537173 PMC 11159036 · doi ↗ · pubmed ↗
- 7Heisler Z, Eastwood B, Mwaiselage J, Kahesa C, Msami K, Soliman AS. Return on Investment of a Breast Cancer Screening Program in Tanzania: Opportunity for Patient and Public Education. J Cancer Educ. 2022;37(3):701–8. doi: 10.1007/s 13187-020-01871-6 32980979 PMC 7997813 · doi ↗ · pubmed ↗
- 8Yu Z, Li B, Zhao S, Du J, Zhang Y, Liu X, et al. Uptake and detection rate of colorectal cancer screening with colonoscopy in China: A population-based, prospective cohort study. Int J Nurs Stud. 2024;153:104728. doi: 10.1016/j.ijnurstu.2024.104728 38461798 · doi ↗ · pubmed ↗
