Sparse and Dense Approaches for the Full-rank Retrieval of Responses for   Dialogues

Gustavo Penha; Claudia Hauff

arXiv:2204.10558·cs.IR·April 25, 2022

Sparse and Dense Approaches for the Full-rank Retrieval of Responses for Dialogues

Gustavo Penha, Claudia Hauff

PDF

Open Access 1 Repo

TL;DR

This paper compares sparse and dense retrieval methods for full-rank response retrieval in dialogues, emphasizing the importance of effective first-stage retrieval in large response sets and proposing techniques that improve retrieval performance.

Contribution

It introduces a comprehensive analysis of sparse and dense retrieval approaches for large-scale dialogue response retrieval, highlighting the effectiveness of learned response expansion and fine-tuned dense models.

Findings

01

Dense retrieval with intermediate training outperforms other methods.

02

Learned response expansion is a strong baseline for sparse retrieval.

03

Hard negatives sampling can negatively impact dense retrieval performance.

Abstract

Ranking responses for a given dialogue context is a popular benchmark in which the setup is to re-rank the ground-truth response over a limited set of $n$ responses, where $n$ is typically 10. The predominance of this setup in conversation response ranking has lead to a great deal of attention to building neural re-rankers, while the first-stage retrieval step has been overlooked. Since the correct answer is always available in the candidate list of $n$ responses, this artificial evaluation setup assumes that there is a first-stage retrieval step which is always able to rank the correct response in its top- $n$ list. In this paper we focus on the more realistic task of full-rank retrieval of responses, where $n$ can be up to millions of responses. We investigate both dialogue context and response expansion techniques for sparse retrieval, as well as zero-shot and fine-tuned dense…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Guzpenha/transformer_rankers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications