Multilingual and Continuous Backchannel Prediction: A Cross-lingual Study
Koji Inoue, Mikey Elmers, Yahui Fu, Zi Haur Pang, Taiga Mori, Divesh Lala, Keiko Ochi, Tatsuya Kawahara

TL;DR
This study develops a multilingual, Transformer-based backchannel prediction model for Japanese, English, and Chinese, revealing language-specific cues and demonstrating cross-lingual differences in timing behavior through extensive experiments.
Contribution
It introduces a unified multilingual backchannel prediction model that captures language-universal and language-specific cues, and provides empirical cross-linguistic insights into backchannel timing.
Findings
Multilingual model matches or surpasses monolingual baselines.
Japanese relies more on short-term linguistic cues, while English and Chinese focus on silence and prosody.
The model operates in real-time with CPU-only inference.
Abstract
We present a multilingual, continuous backchannel prediction model for Japanese, English, and Chinese, and use it to investigate cross-linguistic timing behavior. The model is Transformer-based and operates at the frame level, jointly trained with auxiliary tasks on approximately 300 hours of dyadic conversations. Across all three languages, the multilingual model matches or surpasses monolingual baselines, indicating that it learns both language-universal cues and language-specific timing patterns. Zero-shot transfer with two-language training remains limited, underscoring substantive cross-lingual differences. Perturbation analyses reveal distinct cue usage: Japanese relies more on short-term linguistic information, whereas English and Chinese are more sensitive to silence duration and prosodic variation; multilingual training encourages shared yet adaptable representations and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Emotion and Mood Recognition · Action Observation and Synchronization
