Know When to Fuse: Investigating Non-English Hybrid Retrieval in the Legal Domain
Antoine Louis, Gijs van Dijck, Gerasimos Spanakis

TL;DR
This study explores hybrid retrieval methods in French legal texts, revealing that fusion improves zero-shot performance but may reduce in-domain results unless carefully tuned, thus expanding understanding of non-English domain-specific search.
Contribution
It investigates hybrid search effectiveness in French legal retrieval, highlighting differences between zero-shot and in-domain scenarios and providing new insights into fusion strategies in non-English contexts.
Findings
Fusion improves zero-shot retrieval performance.
In-domain fusion often reduces accuracy unless scores are carefully weighted.
Study extends hybrid search understanding to non-English, specialized domains.
Abstract
Hybrid search has emerged as an effective strategy to offset the limitations of different matching paradigms, especially in out-of-domain contexts where notable improvements in retrieval quality have been observed. However, existing research predominantly focuses on a limited set of retrieval methods, evaluated in pairs on domain-general datasets exclusively in English. In this work, we study the efficacy of hybrid search across a variety of prominent retrieval models within the unexplored field of law in the French language, assessing both zero-shot and in-domain scenarios. Our findings reveal that in a zero-shot context, fusing different domain-general models consistently enhances performance compared to using a standalone model, regardless of the fusion method. Surprisingly, when models are trained in-domain, we find that fusion generally diminishes performance relative to using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterpreting and Communication in Healthcare · Artificial Intelligence in Law · Translation Studies and Practices
MethodsSparse Evolutionary Training
