TL;DR
This paper evaluates the effectiveness of rationale extraction techniques for explaining legal outcome predictions in ECtHR cases, revealing discrepancies between model explanations and legal expert judgments.
Contribution
It introduces a new ECtHR dataset, compares interpretability methods, and highlights differences between model rationales and expert reasoning.
Findings
Models' reasons differ from legal experts' judgments.
Existing explanation techniques may lack plausibility in legal contexts.
The source code for experiments is publicly available.
Abstract
Interpretability is critical for applications of large language models (LLMs) in the legal domain, where trust and transparency are essential. A central NLP task in this setting is legal outcome prediction, where models forecast whether a court will find a violation of a given right. We study this task on decisions from the European Court of Human Rights (ECtHR), introducing a new ECtHR dataset with carefully curated positive (violation) and negative (non-violation) cases. Existing works propose both task-specific approaches and model-agnostic techniques to explain downstream performance, but it remains unclear which techniques best explain legal outcome prediction. To address this, we propose a comparative analysis framework for model-agnostic interpretability methods. We focus on two rationale extraction techniques that justify model outputs with concise, human-interpretable text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
