Enhanced document retrieval with topic embeddings
Kavsar Huseynova, Jafar Isbarov

TL;DR
This paper introduces a novel topic-aware vectorization method to improve document retrieval accuracy in RAG systems, especially when multiple related topics are present, addressing a key bottleneck in retrieval performance.
Contribution
The paper presents a new text vectorization technique that incorporates topic information, enhancing retrieval accuracy in RAG architectures compared to existing methods.
Findings
Improved retrieval accuracy with topic-aware vectorization
Effective handling of multiple related topics in document corpus
Discussion on challenges in evaluating RAG systems
Abstract
Document retrieval systems have experienced a revitalized interest with the advent of retrieval-augmented generation (RAG). RAG architecture offers a lower hallucination rate than LLM-only applications. However, the accuracy of the retrieval mechanism is known to be a bottleneck in the efficiency of these applications. A particular case of subpar retrieval performance is observed in situations where multiple documents from several different but related topics are in the corpus. We have devised a new vectorization method that takes into account the topic information of the document. The paper introduces this new method for text vectorization and evaluates it in the context of RAG. Furthermore, we discuss the challenge of evaluating RAG systems, which pertains to the case at hand.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · WordPiece · Residual Connection · Multi-Head Attention · Linear Warmup With Linear Decay · Attention Dropout · Adam · Layer Normalization · Weight Decay
