LTCR: Long-Text Chinese Rumor Detection Dataset

Ziyang Ma; Mengsha Liu; Guian Fang; Ying Shen

arXiv:2306.07201·cs.CL·June 14, 2023·1 cites

LTCR: Long-Text Chinese Rumor Detection Dataset

Ziyang Ma, Mengsha Liu, Guian Fang, Ying Shen

PDF

Open Access 1 Repo

TL;DR

This paper introduces LTCR, a new long-text Chinese rumor detection dataset, and proposes extmethod, a salience-aware model that achieves high accuracy in identifying fake news, especially in complex COVID-19 related misinformation.

Contribution

The paper provides a novel long-text Chinese rumor dataset and a salience-aware detection model that improves fake news identification accuracy.

Findings

01

Achieved 95.85% accuracy on the dataset

02

High fake news recall of 90.91%

03

F-score of 90.60% in detection

Abstract

False information can spread quickly on social media, negatively influencing the citizens' behaviors and responses to social events. To better detect all of the fake news, especially long texts which are harder to find completely, a Long-Text Chinese Rumor detection dataset named LTCR is proposed. The LTCR dataset provides a valuable resource for accurately detecting misinformation, especially in the context of complex fake news related to COVID-19. The dataset consists of 1,729 and 500 pieces of real and fake news, respectively. The average lengths of real and fake news are approximately 230 and 152 characters. We also propose \method, Salience-aware Fake News Detection Model, which achieves the highest accuracy (95.85%), fake news recall (90.91%) and F-score (90.60%) on the dataset. (https://github.com/Enderfga/DoubleCheck)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

enderfga/doublecheck
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling · Spam and Phishing Detection