The Craft of Selective Prediction: Towards Reliable Case Outcome   Classification -- An Empirical Study on European Court of Human Rights Cases

T.Y.S.S. Santosh; Irtiza Chowdhury; Shanshan Xu; Matthias Grabmair

arXiv:2409.18645·cs.CL·September 30, 2024

The Craft of Selective Prediction: Towards Reliable Case Outcome Classification -- An Empirical Study on European Court of Human Rights Cases

T.Y.S.S. Santosh, Irtiza Chowdhury, Shanshan Xu, Matthias Grabmair

PDF

Open Access

TL;DR

This empirical study investigates how different design choices affect the reliability of legal case outcome classification models, emphasizing confidence estimation and model calibration in high-stakes legal NLP tasks.

Contribution

It systematically explores the impact of pre-training data, model size, and confidence estimation methods on model reliability in legal NLP, a novel focus in this domain.

Findings

01

Diverse, domain-specific pre-training improves calibration.

02

Larger models tend to be overconfident.

03

Monte Carlo dropout provides reliable confidence estimates.

Abstract

In high-stakes decision-making tasks within legal NLP, such as Case Outcome Classification (COC), quantifying a model's predictive confidence is crucial. Confidence estimation enables humans to make more informed decisions, particularly when the model's certainty is low, or where the consequences of a mistake are significant. However, most existing COC works prioritize high task performance over model reliability. This paper conducts an empirical investigation into how various design choices including pre-training corpus, confidence estimator and fine-tuning loss affect the reliability of COC models within the framework of selective prediction. Our experiments on the multi-label COC task, focusing on European Court of Human Rights (ECtHR) cases, highlight the importance of a diverse yet domain-specific pre-training corpus for better calibration. Additionally, we demonstrate that larger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEuropean and International Law Studies

MethodsDropout · Monte Carlo Dropout