AliCHI: A Large-scale Multi-modal Dataset and Automated Evaluation Tool for Human-like Dialogue Systems
Zhiling Luo, Qiankun Shi, Sha Zhao, Wei Zhou, Haiqing Chen, Yuankai Ma, and Haitao Leng

TL;DR
This paper introduces AliCHI, a large-scale multi-modal dataset of face-to-face human conversations with detailed annotations, along with an evaluation tool for assessing human-like dialogue systems in terms of turn-taking and backchannel prediction.
Contribution
It provides a comprehensive multi-modal dataset with fine-grained annotations and an automated evaluation tool for human-like dialogue systems, addressing limitations of existing single-modality datasets.
Findings
Dataset contains 635 dialogue sessions from 200 participants.
Evaluation tool assesses turn-taking and backchannel prediction accuracy.
Open-sourced data and tools facilitate future research.
Abstract
A well-designed interactive human-like dialogue system is expected to take actions (e.g. smiling) and respond in a pattern similar to humans. However, due to the limitation of single-modality (only speech) or small volume of currently public datasets, most dialogue systems can only respond in speech and cannot take human-like actions. In this work, we build a large-scale multi-modal dataset of human-to-human conversation in a face-to-face fashion, with fine-grained annotations. The raw data in video format contains 635 dialogue sessions, being collected from 200 participants on designed topics and lasting 52 hours in total. Moreover, we manually annotated the verbal and non-verbal behaviors in each dialogue session on their start/end timestamp. Furthermore, we developed a corresponding evaluation tool for human-like dialogue systems to automatically evaluates the accuracy of two basic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · Topic Modeling
