Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments

Marc Feger; Katarina Boland; Stefan Dietze

arXiv:2505.22137·cs.CL·May 29, 2025

Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments

Marc Feger, Katarina Boland, Stefan Dietze

PDF

Open Access 1 Video

TL;DR

This paper critically evaluates state-of-the-art argument mining models, revealing they often rely on dataset-specific cues and struggle to generalize across different datasets, despite strong benchmark performance.

Contribution

It provides the first large-scale re-evaluation of transformer models for argument mining, highlighting issues of dataset reliance and proposing methods to improve generalization.

Findings

01

Models rely on lexical shortcuts tied to content words.

02

Performance drops significantly on unseen datasets.

03

Task-specific pre-training improves robustness and generalization.

Abstract

Identifying arguments is a necessary prerequisite for various tasks in automated discourse analysis, particularly within contexts such as political debates, online discussions, and scientific reasoning. In addition to theoretical advances in understanding the constitution of arguments, a significant body of research has emerged around practical argument mining, supported by a growing number of publicly available datasets. On these benchmarks, BERT-like transformers have consistently performed best, reinforcing the belief that such models are broadly applicable across diverse contexts of debate. This study offers the first large-scale re-evaluation of such state-of-the-art models, with a specific focus on their ability to generalize in identifying arguments. We evaluate four transformers, three standard and one enhanced with contrastive pre-training for better generalization, on 17…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments· underline

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Multi-Agent Systems and Negotiation