# Large Scale Question Paraphrase Retrieval with Smoothed Deep Metric   Learning

**Authors:** Daniele Bonadiman, Anjishnu Kumar, Arpit Mittal

arXiv: 1905.12786 · 2019-05-31

## TL;DR

This paper introduces a neural question paraphrase retrieval system using smoothed deep metric loss, which outperforms traditional triplet loss in noisy label environments, enhancing large-scale community question answering applications.

## Contribution

The paper proposes a novel smoothed deep metric loss for neural information retrieval, improving paraphrase retrieval accuracy under noisy labels compared to triplet loss.

## Key findings

- SDML significantly outperforms triplet loss in noisy label settings
- The system effectively retrieves equivalent questions at large scale
- Automatic dataset generation from question-answer logs is feasible and effective

## Abstract

The goal of a Question Paraphrase Retrieval (QPR) system is to retrieve equivalent questions that result in the same answer as the original question. Such a system can be used to understand and answer rare and noisy reformulations of common questions by mapping them to a set of canonical forms. This has large-scale applications for community Question Answering (cQA) and open-domain spoken language question answering systems. In this paper we describe a new QPR system implemented as a Neural Information Retrieval (NIR) system consisting of a neural network sentence encoder and an approximate k-Nearest Neighbour index for efficient vector retrieval. We also describe our mechanism to generate an annotated dataset for question paraphrase retrieval experiments automatically from question-answer logs via distant supervision. We show that the standard loss function in NIR, triplet loss, does not perform well with noisy labels. We propose smoothed deep metric loss (SDML) and with our experiments on two QPR datasets we show that it significantly outperforms triplet loss in the noisy label setting.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.12786/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1905.12786/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1905.12786/full.md

---
Source: https://tomesphere.com/paper/1905.12786