# Improving Legal Information Retrieval by Distributional Composition with   Term Order Probabilities

**Authors:** Danilo S. Carvalho, Duc-Vu Tran, Van-Khanh Tran, Le-Nguyen Minh

arXiv: 1706.01038 · 2017-06-13

## TL;DR

This paper proposes a two-stage legal information retrieval method combining lexical and distributional techniques, with disambiguation rules, showing small but meaningful improvements in retrieval performance.

## Contribution

It introduces a novel combination of lexical statistics and distributional sentence representations with disambiguation rules for legal IR.

## Key findings

- Small gains in retrieval performance achieved
- Disambiguation improves result reliability
- Error analysis provides insights into method limitations

## Abstract

Legal professionals worldwide are currently trying to get up-to-pace with the explosive growth in legal document availability through digital means. This drives a need for high efficiency Legal Information Retrieval (IR) and Question Answering (QA) methods. The IR task in particular has a set of unique challenges that invite the use of semantic motivated NLP techniques. In this work, a two-stage method for Legal Information Retrieval is proposed, combining lexical statistics and distributional sentence representations in the context of Competition on Legal Information Extraction/Entailment (COLIEE). The combination is done with the use of disambiguation rules, applied over the rankings obtained through n-gram statistics. After the ranking is done, its results are evaluated for ambiguity, and disambiguation is done if a result is decided to be unreliable for a given query. Competition and experimental results indicate small gains in overall retrieval performance using the proposed approach. Additionally, an analysis of error and improvement cases is presented for a better understanding of the contributions.

---
Source: https://tomesphere.com/paper/1706.01038