Analysing the Robustness of Dual Encoders for Dense Retrieval Against   Misspellings

Georgios Sidiropoulos; Evangelos Kanoulas

arXiv:2205.02303·cs.IR·May 6, 2022

Analysing the Robustness of Dual Encoders for Dense Retrieval Against Misspellings

Georgios Sidiropoulos, Evangelos Kanoulas

PDF

1 Repo

TL;DR

This paper investigates how dual-encoder dense retrieval models perform under noisy conditions with typos, revealing significant performance drops and proposing data augmentation with contrastive learning to enhance robustness.

Contribution

It introduces a novel approach combining data augmentation and contrastive learning to improve the robustness of dense retrievers against typographical errors.

Findings

01

Performance drops significantly with typos in user questions

02

Data augmentation with contrastive learning improves robustness

03

Different types of typos affect embeddings differently

Abstract

Dense retrieval is becoming one of the standard approaches for document and passage ranking. The dual-encoder architecture is widely adopted for scoring question-passage pairs due to its efficiency and high performance. Typically, dense retrieval models are evaluated on clean and curated datasets. However, when deployed in real-life applications, these models encounter noisy user-generated text. That said, the performance of state-of-the-art dense retrievers can substantially deteriorate when exposed to noisy text. In this work, we study the robustness of dense retrievers against typos in the user question. We observe a significant drop in the performance of the dual-encoder model when encountering typos and explore ways to improve its robustness by combining data augmentation with contrastive learning. Our experiments on two large-scale passage ranking and open-domain question…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gsidiropoulos/dense-retrieval-against-misspellings
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.