Text Embeddings for Retrieval From a Large Knowledge Base

Tolgahan Cakaloglu; Christian Szegedy; Xiaowei Xu

arXiv:1810.10176·cs.IR·May 3, 2019·5 cites

Text Embeddings for Retrieval From a Large Knowledge Base

Tolgahan Cakaloglu, Christian Szegedy, Xiaowei Xu

PDF

Open Access

TL;DR

This paper evaluates different text embedding methods for document retrieval in open-domain question answering, demonstrating that neural augmentation and deeper models significantly improve retrieval accuracy.

Contribution

It introduces neural models trained specifically for retrieval that enhance existing embeddings, with deeper models providing superior performance.

Findings

01

Neural augmentation improves top-1 recall by 14%.

02

Deeper neural models outperform shallower ones.

03

Augmentation yields significant gains over base embeddings.

Abstract

Text embedding representing natural language documents in a semantic vector space can be used for document retrieval using nearest neighbor lookup. In order to study the feasibility of neural models specialized for retrieval in a semantically meaningful way, we suggest the use of the Stanford Question Answering Dataset (SQuAD) in an open-domain question answering context, where the first task is to find paragraphs useful for answering a given question. First, we compare the quality of various text-embedding methods on the performance of retrieval and give an extensive empirical comparison on the performance of various non-augmented base embedding with, and without IDF weighting. Our main results are that by training deep residual neural models, specifically for retrieval purposes, can yield significant gains when it is used to augment existing embeddings. We also establish that deeper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications