JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections

Jayr Pereira; Leandro Fernandes; Erick de Brito; Roberto Lotufo; Luiz Bonifacio

arXiv:2604.06098·cs.IR·April 9, 2026

JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections

Jayr Pereira, Leandro Fernandes, Erick de Brito, Roberto Lotufo, Luiz Bonifacio

PDF

TL;DR

JU'A is a comprehensive benchmark designed to evaluate and compare legal information retrieval methods across diverse Brazilian legal texts, promoting reproducibility and continuous assessment.

Contribution

It introduces JU'A, a novel, publicly available benchmark infrastructure for Brazilian legal IR, supporting heterogeneous collections and multiple retrieval paradigms.

Findings

01

Domain adaptation improves retrieval on JU'A-Juris subset.

02

BM25 remains competitive in lexical and institutional contexts.

03

The benchmark effectively distinguishes between different retrieval approaches.

Abstract

Legal information retrieval in Portuguese remains difficult to evaluate systematically because available datasets differ widely in document type, query style, and relevance definition. We present JU\'A, a public benchmark for Brazilian legal retrieval designed to support more reproducible and comparable evaluation across heterogeneous legal collections. More broadly, JU\'A is intended not only as a benchmark, but as a continuous evaluation infrastructure for Brazilian legal IR, combining shared protocols, common ranking metrics, fixed splits when applicable, and a public leaderboard. The benchmark covers jurisprudence retrieval as well as broader legislative, regulatory, and question-driven legal search. We evaluate lexical, dense, and BM25-based reranking pipelines, including a domain-adapted Qwen embedding model fine-tuned on JU\'A-aligned supervision. Results show that the benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.