Vietnamese Legal Information Retrieval in Question-Answering System
Thiem Nguyen Ba, Vinh Doan The, Tung Pham Quang, Toan Tran Van

TL;DR
This paper improves Vietnamese legal information retrieval in question-answering systems by developing new data processing, re-ranking, and fusion techniques tailored to language-specific challenges, enhancing accuracy and reliability.
Contribution
The paper introduces novel data processing, normalization in fusion, and re-ranking methods specifically designed for Vietnamese legal QA systems, addressing language-specific retrieval challenges.
Findings
Enhanced retrieval accuracy in Vietnamese legal QA systems.
Improved document re-ranking and fusion techniques.
Significant performance gains demonstrated in experiments.
Abstract
In the modern era of rapidly increasing data volumes, accurately retrieving and recommending relevant documents has become crucial in enhancing the reliability of Question Answering (QA) systems. Recently, Retrieval Augmented Generation (RAG) has gained significant recognition for enhancing the capabilities of large language models (LLMs) by mitigating hallucination issues in QA systems, which is particularly beneficial in the legal domain. Various methods, such as semantic search using dense vector embeddings or a combination of multiple techniques to improve results before feeding them to LLMs, have been proposed. However, these methods often fall short when applied to the Vietnamese language due to several challenges, namely inefficient Vietnamese data processing leading to excessive token length or overly simplistic ensemble techniques that lead to instability and limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
