An Evaluation Dataset for Intent Classification and Out-of-Scope   Prediction

Stefan Larson; Anish Mahendran; Joseph J. Peper; Christopher Clarke,; Andrew Lee; Parker Hill; Jonathan K. Kummerfeld; Kevin Leach; Michael A.; Laurenzano; Lingjia Tang; Jason Mars

arXiv:1909.02027·cs.CL·September 6, 2019

An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction

Stefan Larson, Anish Mahendran, Joseph J. Peper, Christopher Clarke,, Andrew Lee, Parker Hill, Jonathan K. Kummerfeld, Kevin Leach, Michael A., Laurenzano, Lingjia Tang, Jason Mars

PDF

5 Repos 2 Models 2 Datasets

TL;DR

This paper introduces a new dataset for intent classification in task-oriented dialog systems that includes out-of-scope queries, highlighting the challenge of detecting queries outside supported intents and providing a benchmark for future research.

Contribution

It presents a comprehensive dataset with out-of-scope queries across 150 intents and evaluates classifiers, addressing a key gap in realistic intent detection benchmarking.

Findings

01

Classifiers perform well on in-scope intent classification.

02

Detection of out-of-scope queries remains challenging.

03

The dataset enables more realistic benchmarking of dialog systems.

Abstract

Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope---i.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.