ConQRet: Benchmarking Fine-Grained Evaluation of Retrieval Augmented Argumentation with LLM Judges
Kaustubh D. Dhole, Kai Shu, Eugene Agichtein

TL;DR
This paper introduces ConQRet, a benchmark for evaluating retrieval-augmented argumentation using multiple fine-grained LLM judges, addressing the limitations of existing datasets and evaluation methods for complex, evidence-based arguments.
Contribution
It proposes a new benchmark and automated evaluation framework with multiple LLM judges for nuanced assessment of retrieval effectiveness and argument quality in complex, real-world scenarios.
Findings
LLM judges outperform traditional metrics and crowdsourcing in evaluation accuracy.
ConQRet provides a comprehensive dataset of long, complex arguments grounded in real-world evidence.
Automated evaluation methods show promise for advancing computational argumentation research.
Abstract
Computational argumentation, which involves generating answers or summaries for controversial topics like abortion bans and vaccination, has become increasingly important in today's polarized environment. Sophisticated LLM capabilities offer the potential to provide nuanced, evidence-based answers to such questions through Retrieval-Augmented Argumentation (RAArg), leveraging real-world evidence for high-quality, grounded arguments. However, evaluating RAArg remains challenging, as human evaluation is costly and difficult for complex, lengthy answers on complicated topics. At the same time, re-using existing argumentation datasets is no longer sufficient, as they lack long, complex arguments and realistic evidence from potentially misleading sources, limiting holistic evaluation of retrieval effectiveness and argument quality. To address these gaps, we investigate automated evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations · Law, Economics, and Judicial Systems
