CoCoLoFa: A Dataset of News Comments with Common Logical Fallacies   Written by LLM-Assisted Crowds

Min-Hsuan Yeh; Ruyuan Wan; and Ting-Hao 'Kenneth' Huang

arXiv:2410.03457·cs.CL·October 7, 2024

CoCoLoFa: A Dataset of News Comments with Common Logical Fallacies Written by LLM-Assisted Crowds

Min-Hsuan Yeh, Ruyuan Wan, and Ting-Hao 'Kenneth' Huang

PDF

Open Access 2 Models 1 Datasets 1 Video

TL;DR

This paper presents CoCoLoFa, a large dataset of news comments annotated for logical fallacies, created through crowdsourcing aided by LLMs, and demonstrates its effectiveness for training fallacy detection models.

Contribution

Introduces the CoCoLoFa dataset, the largest of its kind, and shows how combining crowdsourcing with LLM assistance improves dataset quality for complex linguistic annotations.

Findings

01

BERT-based models trained on CoCoLoFa achieved high F1 scores (0.86 and 0.87).

02

The dataset was rated as high quality and reliable by experts.

03

Combining crowdsourcing with LLMs enhances dataset construction for complex tasks.

Abstract

Detecting logical fallacies in texts can help users spot argument flaws, but automating this detection is not easy. Manually annotating fallacies in large-scale, real-world text data to create datasets for developing and validating detection models is costly. This paper introduces CoCoLoFa, the largest known logical fallacy dataset, containing 7,706 comments for 648 news articles, with each comment labeled for fallacy presence and type. We recruited 143 crowd workers to write comments embodying specific fallacy types (e.g., slippery slope) in response to news articles. Recognizing the complexity of this writing task, we built an LLM-powered assistant into the workers' interface to aid in drafting and refining their comments. Experts rated the writing quality and labeling validity of CoCoLoFa as high and reliable. BERT-based models fine-tuned using CoCoLoFa achieved the highest fallacy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

maxime-antoine-dev/fades-dataset
dataset· 76 dl
76 dl

Videos

CoCoLoFa: A Dataset of News Comments with Common Logical Fallacies Written by LLM-Assisted Crowds· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification