A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1

Marcin Pietro\'n; Rafa{\l} Olszowski; Jakub Gomu{\l}ka; Filip Gampel; Andrzej Tomski

arXiv:2507.08621·cs.CL·July 25, 2025

A comprehensive study of LLM-based argument classification: from LLAMA through GPT-4o to Deepseek-R1

Marcin Pietro\'n, Rafa{\l} Olszowski, Jakub Gomu{\l}ka, Filip Gampel, Andrzej Tomski

PDF

TL;DR

This study evaluates various large language models, including GPT-4o, Llama, and Deepseek-R1, on argument classification tasks across multiple datasets, highlighting their strengths, weaknesses, and the impact of reasoning prompts.

Contribution

It provides the first comprehensive analysis of LLM performance on argument classification datasets, revealing model strengths, common errors, and limitations of prompt algorithms.

Findings

01

GPT-4o outperforms other models in argument classification benchmarks.

02

Deepseek-R1 shows superiority when enhanced with reasoning capabilities.

03

All models still make notable errors, indicating room for improvement.

Abstract

Argument mining (AM) is an interdisciplinary research field that integrates insights from logic, philosophy, linguistics, rhetoric, law, psychology, and computer science. It involves the automatic identification and extraction of argumentative components, such as premises and claims, and the detection of relationships between them, such as support, attack, or neutrality. Recently, the field has advanced significantly, especially with the advent of large language models (LLMs), which have enhanced the efficiency of analyzing and extracting argument semantics compared to traditional methods and other deep learning models. There are many benchmarks for testing and verifying the quality of LLM, but there is still a lack of research and results on the operation of these models in publicly available argument classification databases. This paper presents a study of a selection of LLM's, using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.