JU\'A -- A Benchmark for Information Retrieval in Brazilian Legal Text Collections
Jayr Pereira, Leandro Fernandes, Erick de Brito, Roberto Lotufo, Luiz Bonifacio

TL;DR
JU'A is a comprehensive benchmark designed to evaluate and compare legal information retrieval methods across diverse Brazilian legal texts, promoting reproducibility and continuous assessment.
Contribution
It introduces JU'A, a novel, publicly available benchmark infrastructure for Brazilian legal IR, supporting heterogeneous collections and multiple retrieval paradigms.
Findings
Domain adaptation improves retrieval on JU'A-Juris subset.
BM25 remains competitive in lexical and institutional contexts.
The benchmark effectively distinguishes between different retrieval approaches.
Abstract
Legal information retrieval in Portuguese remains difficult to evaluate systematically because available datasets differ widely in document type, query style, and relevance definition. We present JU\'A, a public benchmark for Brazilian legal retrieval designed to support more reproducible and comparable evaluation across heterogeneous legal collections. More broadly, JU\'A is intended not only as a benchmark, but as a continuous evaluation infrastructure for Brazilian legal IR, combining shared protocols, common ranking metrics, fixed splits when applicable, and a public leaderboard. The benchmark covers jurisprudence retrieval as well as broader legislative, regulatory, and question-driven legal search. We evaluate lexical, dense, and BM25-based reranking pipelines, including a domain-adapted Qwen embedding model fine-tuned on JU\'A-aligned supervision. Results show that the benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
