Evaluating LLM-Driven Summarisation of Parliamentary Debates with Computational Argumentation
Eoghan Cunningham, Derek Greene, James Cross, Antonio Rago

TL;DR
This paper introduces a formal framework using computational argumentation to evaluate the faithfulness of LLM-generated summaries of parliamentary debates, focusing on argument structure preservation.
Contribution
It presents a novel evaluation method grounded in argumentation theory to assess the accuracy of parliamentary debate summaries produced by LLMs.
Findings
Demonstrated the framework on European Parliament debates and summaries.
Showed improved alignment with human judgments compared to existing metrics.
Highlighted the importance of argument structure in faithful summarisation.
Abstract
Understanding how policy is debated and justified in parliament is a fundamental aspect of the democratic process. However, the volume and complexity of such debates mean that outside audiences struggle to engage. Meanwhile, Large Language Models (LLMs) have been shown to enable automated summarisation at scale. While summaries of debates can make parliamentary procedures more accessible, evaluating whether these summaries faithfully communicate argumentative content remains challenging. Existing automated summarisation metrics have been shown to correlate poorly with human judgements of consistency (i.e., faithfulness or alignment between summary and source). In this work, we propose a formal framework for evaluating parliamentary debate summaries that grounds argument structures in the contested proposals up for debate. Our novel approach, driven by computational argumentation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
