Enhanced vectors for top-k document retrieval in Question Answering

Mohammed Hammad

arXiv:2210.10584·cs.IR·October 20, 2022

Enhanced vectors for top-k document retrieval in Question Answering

Mohammed Hammad

PDF

Open Access

TL;DR

This paper introduces a novel dense vector embedding method for document retrieval in question answering systems, enabling fast and accurate identification of relevant passages with real-time query processing.

Contribution

It proposes a unique embedding technique that incorporates passage identifiers into dense vectors, improving retrieval efficiency and accuracy in QA applications.

Findings

01

Real-time query vector creation in ~4 milliseconds

02

Enhanced retrieval accuracy for relevant documents

03

Efficient embedding of passage identifiers into vector space

Abstract

Modern day applications, especially information retrieval webapps that involve "search" as their use cases are gradually moving towards "answering" modules. Conversational chatbots which have been proved to be more engaging to users, use Question Answering as their core. Since, precise answering is computationally expensive, several approaches have been developed to prefetch the most relevant documents/passages from the database that contain the answer. We propose a different approach that retrieves the evidence documents efficiently and accurately, making sure that the relevant document for a given user query is not missed. We do so by assigning each document (or passage in our case), a unique identifier and using them to create dense vectors which can be efficiently indexed. More precisely, we use the identifier to predict randomly sampled context window words of the relevant question…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques · Text and Document Classification Technologies