Retrieving Semantically Similar Decisions under Noisy Institutional Labels: Robust Comparison of Embedding Methods

Tereza Novotna; Jakub Harasta

arXiv:2512.05681·cs.CL·December 8, 2025

Retrieving Semantically Similar Decisions under Noisy Institutional Labels: Robust Comparison of Embedding Methods

Tereza Novotna, Jakub Harasta

PDF

Open Access

TL;DR

This paper compares general-purpose and domain-specific embedding models for retrieving Czech Constitutional Court decisions, demonstrating the robustness of a noise-aware evaluation framework despite noisy labels and modest absolute performance.

Contribution

It introduces a noise-aware evaluation method for comparing embedding models in legal retrieval tasks with noisy labels, highlighting the superiority of general-purpose models in this context.

Findings

01

OpenAI embedder outperforms BERT in retrieval accuracy

02

Evaluation framework is robust to noisy labels

03

Differences are statistically significant

Abstract

Retrieving case law is a time-consuming task predominantly carried out by querying databases. We provide a comparison of two models in three different settings for Czech Constitutional Court decisions: (i) a large general-purpose embedder (OpenAI), (ii) a domain-specific BERT-trained from scratch on ~30,000 decisions using sliding windows and attention pooling. We propose a noise-aware evaluation including IDF-weighted keyword overlap as graded relevance, binarization via two thresholds (0.20 balanced, 0.28 strict), significance via paired bootstrap, and an nDCG diagnosis supported with qualitative analysis. Despite modest absolute nDCG (expected under noisy labels), the general OpenAI embedder decisively outperforms the domain pre-trained BERT in both settings at @10/@20/@100 across both thresholds; differences are statistically significant. Diagnostics attribute low absolutes to label…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods