TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin, Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu

TL;DR
TikTalk is a new large-scale video-based multi-modal dialogue dataset that captures real-world chitchat, presenting unique challenges and opportunities for developing more human-like multi-modal conversational AI.
Contribution
The paper introduces TikTalk, a comprehensive video-based multi-modal dialogue dataset with diverse context types, and evaluates baseline models, highlighting the potential of LLMs and external knowledge integration.
Findings
Models with large language models generate more diverse responses.
Knowledge graph-based models perform best overall.
Current models still struggle with complex multi-modal understanding.
Abstract
To facilitate the research on intelligent and human-like chatbots with multi-modal context, we introduce a new video-based multi-modal dialogue dataset, called TikTalk. We collect 38K videos from a popular video-sharing platform, along with 367K conversations posted by users beneath them. Users engage in spontaneous conversations based on their multi-modal experiences from watching videos, which helps recreate real-world chitchat context. Compared to previous multi-modal dialogue datasets, the richer context types in TikTalk lead to more diverse conversations, but also increase the difficulty in capturing human interests from intricate multi-modal information to generate personalized responses. Moreover, external knowledge is more frequently evoked in our dataset. These facts reveal new challenges for multi-modal dialogue models. We quantitatively demonstrate the characteristics of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChild Development and Digital Technology · Speech and dialogue systems · ICT in Developing Communities
