Fine-grained Fallacy Detection with Human Label Variation
Alan Ramponi, Agnese Daffara, Sara Tonelli

TL;DR
This paper presents Faina, a novel dataset for fallacy detection that captures human disagreement and multiple plausible answers, along with a framework for evaluation that accounts for label variation and partial matches.
Contribution
Introduces Faina, the first fallacy detection dataset embracing human label variation, and proposes an evaluation framework that handles multiple reliable labels and partial span matches.
Findings
Multi-task and multi-label transformer models perform strongly across setups.
Faina dataset reveals significant human disagreement in fallacy annotations.
Evaluation framework effectively accounts for label variation and partial matches.
Abstract
We introduce Faina, the first dataset for fallacy detection that embraces multiple plausible answers and natural disagreement. Faina includes over 11K span-level annotations with overlaps across 20 fallacy types on social media posts in Italian about migration, climate change, and public health given by two expert annotators. Through an extensive annotation study that allowed discussion over multiple rounds, we minimize annotation errors whilst keeping signals of human label variation. Moreover, we devise a framework that goes beyond "single ground truth" evaluation and simultaneously accounts for multiple (equally reliable) test sets and the peculiarities of the task, i.e., partial span matches, overlaps, and the varying severity of labeling errors. Our experiments across four fallacy detection setups show that multi-task and multi-label transformer-based approaches are strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Handwritten Text Recognition Techniques
