HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability
Yanfang Chen, Ding Chen, Shichao Song, Simin Niu, Hanyu, Wang, Zeyun Tang, Feiyu Xiong, Zhiyu Li

TL;DR
This paper introduces HRDE, a retrieval-augmented large language model for Chinese health rumor detection and explainability, utilizing the largest Chinese health rumor dataset to improve accuracy and provide explanations.
Contribution
The paper constructs the largest Chinese health rumor dataset (HealthRCN) and proposes HRDE, a retrieval-augmented model that enhances rumor detection and explainability.
Findings
HRDE achieves 91.04% accuracy in rumor detection.
HRDE outperforms GPT-4-1106-Preview in accuracy and answer quality.
HealthRCN is the largest Chinese health rumor dataset to date.
Abstract
As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-source dataset of health rumor information, as well as effective and reliable rumor detection methods. This paper addresses this gap by constructing a dataset containing 1.12 million health-related rumors (HealthRCN) through web scraping of common health-related questions and a series of data processing steps. HealthRCN is the largest known dataset of Chinese health information rumors to date. Based on this dataset, we propose retrieval-augmented large language models for Chinese health rumor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Topic Modeling · Data-Driven Disease Surveillance
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
