Benchmarking Natural Language Understanding Services for building   Conversational Agents

Xingkun Liu; Arash Eshghi; Pawel Swietojanski; Verena Rieser

arXiv:1903.05566·cs.CL·March 27, 2019·88 cites

Benchmarking Natural Language Understanding Services for building Conversational Agents

Xingkun Liu, Arash Eshghi, Pawel Swietojanski, Verena Rieser

PDF

Open Access 5 Repos 2 Models 5 Datasets

TL;DR

This paper provides a comprehensive evaluation of popular NLU services across multiple domains, highlighting their strengths and weaknesses in intent classification and entity recognition.

Contribution

It presents the first large-scale, multi-domain benchmarking of NLU tools, offering valuable insights into their comparative performance.

Findings

01

Watson outperforms others in intent classification

02

Watson performs poorly in entity type recognition due to low precision

03

Dialogflow, LUIS, and Rasa perform well across tasks

Abstract

We have recently seen the emergence of several publicly available Natural Language Understanding (NLU) toolkits, which map user utterances to structured, but more abstract, Dialogue Act (DA) or Intent specifications, while making this process accessible to the lay developer. In this paper, we present the first wide coverage evaluation and comparison of some of the most popular NLU services, on a large, multi-domain (21 domains) dataset of 25K user utterances that we have collected and annotated with Intent and Entity Type specifications and which will be released as part of this submission. The results show that on Intent classification Watson significantly outperforms the other platforms, namely, Dialogflow, LUIS and Rasa; though these also perform well. Interestingly, on Entity Type recognition, Watson performs significantly worse due to its low Precision. Again, Dialogflow, LUIS and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques