Retrieving Semantically Similar Decisions under Noisy Institutional Labels: Robust Comparison of Embedding Methods
Tereza Novotna, Jakub Harasta

TL;DR
This paper compares general-purpose and domain-specific embedding models for retrieving Czech Constitutional Court decisions, demonstrating the robustness of a noise-aware evaluation framework despite noisy labels and modest absolute performance.
Contribution
It introduces a noise-aware evaluation method for comparing embedding models in legal retrieval tasks with noisy labels, highlighting the superiority of general-purpose models in this context.
Findings
OpenAI embedder outperforms BERT in retrieval accuracy
Evaluation framework is robust to noisy labels
Differences are statistically significant
Abstract
Retrieving case law is a time-consuming task predominantly carried out by querying databases. We provide a comparison of two models in three different settings for Czech Constitutional Court decisions: (i) a large general-purpose embedder (OpenAI), (ii) a domain-specific BERT-trained from scratch on ~30,000 decisions using sliding windows and attention pooling. We propose a noise-aware evaluation including IDF-weighted keyword overlap as graded relevance, binarization via two thresholds (0.20 balanced, 0.28 strict), significance via paired bootstrap, and an nDCG diagnosis supported with qualitative analysis. Despite modest absolute nDCG (expected under noisy labels), the general OpenAI embedder decisively outperforms the domain pre-trained BERT in both settings at @10/@20/@100 across both thresholds; differences are statistically significant. Diagnostics attribute low absolutes to label…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods
