Improving Legal Information Retrieval by Distributional Composition with Term Order Probabilities
Danilo S. Carvalho, Duc-Vu Tran, Van-Khanh Tran, Le-Nguyen Minh

TL;DR
This paper proposes a two-stage legal information retrieval method combining lexical and distributional techniques, with disambiguation rules, showing small but meaningful improvements in retrieval performance.
Contribution
It introduces a novel combination of lexical statistics and distributional sentence representations with disambiguation rules for legal IR.
Findings
Small gains in retrieval performance achieved
Disambiguation improves result reliability
Error analysis provides insights into method limitations
Abstract
Legal professionals worldwide are currently trying to get up-to-pace with the explosive growth in legal document availability through digital means. This drives a need for high efficiency Legal Information Retrieval (IR) and Question Answering (QA) methods. The IR task in particular has a set of unique challenges that invite the use of semantic motivated NLP techniques. In this work, a two-stage method for Legal Information Retrieval is proposed, combining lexical statistics and distributional sentence representations in the context of Competition on Legal Information Extraction/Entailment (COLIEE). The combination is done with the use of disambiguation rules, applied over the rankings obtained through n-gram statistics. After the ranking is done, its results are evaluated for ambiguity, and disambiguation is done if a result is decided to be unreliable for a given query. Competition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
