A Proposed Large Language Model-Based Smart Search for Archive System

Ha Dung Nguyen; Thi-Hoang Anh Nguyen; Thanh Binh Nguyen

arXiv:2501.07024·cs.AI·January 14, 2025

A Proposed Large Language Model-Based Smart Search for Archive System

Ha Dung Nguyen, Thi-Hoang Anh Nguyen, Thanh Binh Nguyen

PDF

TL;DR

This paper introduces a novel LLM-based smart search framework for digital archives, utilizing RAG techniques to improve retrieval accuracy, handle multilingual queries, and enhance archival search efficiency.

Contribution

It proposes an integrated architecture combining advanced metadata, hybrid retrieval, and response synthesis, demonstrating significant performance improvements over traditional methods.

Findings

01

Enhanced search precision and relevance.

02

Effective multilingual query handling.

03

Improved efficiency in archival information retrieval.

Abstract

This study presents a novel framework for smart search in digital archival systems, leveraging the capabilities of Large Language Models (LLMs) to enhance information retrieval. By employing a Retrieval-Augmented Generation (RAG) approach, the framework enables the processing of natural language queries and transforming non-textual data into meaningful textual representations. The system integrates advanced metadata generation techniques, a hybrid retrieval mechanism, a router query engine, and robust response synthesis, the results proved search precision and relevance. We present the architecture and implementation of the system and evaluate its performance in four experiments concerning LLM efficiency, hybrid retrieval optimizations, multilingual query handling, and the impacts of individual components. Obtained results show significant improvements over conventional approaches and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.