Vietnamese Legal Information Retrieval in Question-Answering System

Thiem Nguyen Ba; Vinh Doan The; Tung Pham Quang; Toan Tran Van

arXiv:2409.13699·cs.IR·September 24, 2024·2 cites

Vietnamese Legal Information Retrieval in Question-Answering System

Thiem Nguyen Ba, Vinh Doan The, Tung Pham Quang, Toan Tran Van

PDF

Open Access

TL;DR

This paper improves Vietnamese legal information retrieval in question-answering systems by developing new data processing, re-ranking, and fusion techniques tailored to language-specific challenges, enhancing accuracy and reliability.

Contribution

The paper introduces novel data processing, normalization in fusion, and re-ranking methods specifically designed for Vietnamese legal QA systems, addressing language-specific retrieval challenges.

Findings

01

Enhanced retrieval accuracy in Vietnamese legal QA systems.

02

Improved document re-ranking and fusion techniques.

03

Significant performance gains demonstrated in experiments.

Abstract

In the modern era of rapidly increasing data volumes, accurately retrieving and recommending relevant documents has become crucial in enhancing the reliability of Question Answering (QA) systems. Recently, Retrieval Augmented Generation (RAG) has gained significant recognition for enhancing the capabilities of large language models (LLMs) by mitigating hallucination issues in QA systems, which is particularly beneficial in the legal domain. Various methods, such as semantic search using dense vector embeddings or a combination of multiple techniques to improve results before feeding them to LLMs, have been proposed. However, these methods often fall short when applied to the Vietnamese language due to several challenges, namely inefficient Vietnamese data processing leading to excessive token length or overly simplistic ensemble techniques that lead to instability and limited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling