The Craft of Selective Prediction: Towards Reliable Case Outcome Classification -- An Empirical Study on European Court of Human Rights Cases
T.Y.S.S. Santosh, Irtiza Chowdhury, Shanshan Xu, Matthias Grabmair

TL;DR
This empirical study investigates how different design choices affect the reliability of legal case outcome classification models, emphasizing confidence estimation and model calibration in high-stakes legal NLP tasks.
Contribution
It systematically explores the impact of pre-training data, model size, and confidence estimation methods on model reliability in legal NLP, a novel focus in this domain.
Findings
Diverse, domain-specific pre-training improves calibration.
Larger models tend to be overconfident.
Monte Carlo dropout provides reliable confidence estimates.
Abstract
In high-stakes decision-making tasks within legal NLP, such as Case Outcome Classification (COC), quantifying a model's predictive confidence is crucial. Confidence estimation enables humans to make more informed decisions, particularly when the model's certainty is low, or where the consequences of a mistake are significant. However, most existing COC works prioritize high task performance over model reliability. This paper conducts an empirical investigation into how various design choices including pre-training corpus, confidence estimator and fine-tuning loss affect the reliability of COC models within the framework of selective prediction. Our experiments on the multi-label COC task, focusing on European Court of Human Rights (ECtHR) cases, highlight the importance of a diverse yet domain-specific pre-training corpus for better calibration. Additionally, we demonstrate that larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEuropean and International Law Studies
MethodsDropout · Monte Carlo Dropout
