Neural Sentence Embedding using Only In-domain Sentences for   Out-of-domain Sentence Detection in Dialog Systems

Seonghan Ryu; Seokhwan Kim; Junhwi Choi; Hwanjo Yu; Gary Geunbae Lee

arXiv:1807.11567·cs.CL·August 1, 2018

Neural Sentence Embedding using Only In-domain Sentences for Out-of-domain Sentence Detection in Dialog Systems

Seonghan Ryu, Seokhwan Kim, Junhwi Choi, Hwanjo Yu, Gary Geunbae Lee

PDF

2 Repos

TL;DR

This paper introduces a neural sentence embedding approach trained solely on in-domain data, using auxiliary domain analysis and autoencoders, to effectively detect out-of-domain sentences in dialog systems, outperforming existing methods.

Contribution

The paper presents a novel neural embedding method that leverages only in-domain data and auxiliary tasks for out-of-domain sentence detection in dialog systems.

Findings

01

Achieved highest accuracy across all tested domains.

02

Outperformed state-of-the-art methods in OOD detection.

03

Effective in eight-domain dialog system evaluations.

Abstract

To ensure satisfactory user experience, dialog systems must be able to determine whether an input sentence is in-domain (ID) or out-of-domain (OOD). We assume that only ID sentences are available as training data because collecting enough OOD sentences in an unbiased way is a laborious and time-consuming job. This paper proposes a novel neural sentence embedding method that represents sentences in a low-dimensional continuous vector space that emphasizes aspects that distinguish ID cases from OOD cases. We first used a large set of unlabeled text to pre-train word representations that are used to initialize neural sentence embedding. Then we used domain-category analysis as an auxiliary task to train neural sentence embedding for OOD sentence detection. After the sentence representations were learned, we used them to train an autoencoder aimed at OOD sentence detection. We evaluated our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSolana Customer Service Number +1-833-534-1729