Enhanced document retrieval with topic embeddings

Kavsar Huseynova; Jafar Isbarov

arXiv:2408.10435·cs.IR·August 21, 2024

Enhanced document retrieval with topic embeddings

Kavsar Huseynova, Jafar Isbarov

PDF

Open Access

TL;DR

This paper introduces a novel topic-aware vectorization method to improve document retrieval accuracy in RAG systems, especially when multiple related topics are present, addressing a key bottleneck in retrieval performance.

Contribution

The paper presents a new text vectorization technique that incorporates topic information, enhancing retrieval accuracy in RAG architectures compared to existing methods.

Findings

01

Improved retrieval accuracy with topic-aware vectorization

02

Effective handling of multiple related topics in document corpus

03

Discussion on challenges in evaluating RAG systems

Abstract

Document retrieval systems have experienced a revitalized interest with the advent of retrieval-augmented generation (RAG). RAG architecture offers a lower hallucination rate than LLM-only applications. However, the accuracy of the retrieval mechanism is known to be a bottleneck in the efficiency of these applications. A particular case of subpar retrieval performance is observed in situations where multiple documents from several different but related topics are in the corpus. We have devised a new vectorization method that takes into account the topic information of the document. The paper introduces this new method for text vectorization and evaluates it in the context of RAG. Furthermore, we discuss the challenge of evaluating RAG systems, which pertains to the case at hand.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Topic Modeling · Text and Document Classification Technologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · WordPiece · Residual Connection · Multi-Head Attention · Linear Warmup With Linear Decay · Attention Dropout · Adam · Layer Normalization · Weight Decay