EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based   Enrolment, Verification, and Identification

Georgios P. Spithourakis; Ivan Vuli\'c; Micha{\l} Lis; I\~nigo; Casanueva; Pawe{\l} Budzianowski

arXiv:2204.13496·cs.CL·April 29, 2022

EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and Identification

Georgios P. Spithourakis, Ivan Vuli\'c, Micha{\l} Lis, I\~nigo, Casanueva, Pawe{\l} Budzianowski

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces EVI, a multilingual spoken dialogue dataset for knowledge-based enrolment, verification, and identification, along with formal task definitions and benchmark models to advance research in multilingual spoken dialogue systems.

Contribution

The paper formalizes three authentication tasks, provides a multilingual dataset, and establishes initial benchmarks for knowledge-based spoken dialogue authentication.

Findings

01

First competitive benchmarks for multilingual spoken dialogue authentication.

02

Challenges of multilingual natural language processing in spoken dialogue.

03

Directions for future research in multilingual dialogue systems.

Abstract

Knowledge-based authentication is crucial for task-oriented spoken dialogue systems that offer personalised and privacy-focused services. Such systems should be able to enrol (E), verify (V), and identify (I) new and recurring users based on their personal information, e.g. postcode, name, and date of birth. In this work, we formalise the three authentication tasks and their evaluation protocols, and we present EVI, a challenging spoken multilingual dataset with 5,506 dialogues in English, Polish, and French. Our proposed models set the first competitive benchmarks, explore the challenges of multilingual natural language processing of spoken dialogue, and set directions for future research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PolyAI-LDN/evi-paper
noneOfficial

Datasets

PolyAI/evi
dataset· 128 dl
128 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Speech and dialogue systems