Towards a Japanese Full-duplex Spoken Dialogue System

Atsumoto Ohashi; Shinya Iizuka; Jingjing Jiang; Ryuichiro Higashinaka

arXiv:2506.02979·cs.CL·June 4, 2025

Towards a Japanese Full-duplex Spoken Dialogue System

Atsumoto Ohashi, Shinya Iizuka, Jingjing Jiang, Ryuichiro Higashinaka

PDF

Open Access 2 Models

TL;DR

This paper introduces the first Japanese full-duplex spoken dialogue system, built upon an English model, trained with large-scale data, and enhanced with synthetic data, showing improved naturalness and meaningfulness.

Contribution

It presents the first publicly available Japanese full-duplex dialogue model based on Moshi, with a novel two-stage training process and synthetic data augmentation.

Findings

01

Outperforms baseline models in naturalness

02

Outperforms baseline models in meaningfulness

03

Effective use of synthetic dialogue data

Abstract

Full-duplex spoken dialogue systems, which can model simultaneous bidirectional features of human conversations such as speech overlaps and backchannels, have attracted significant attention recently. However, the study of full-duplex spoken dialogue systems for the Japanese language has been limited, and the research on their development in Japanese remains scarce. In this paper, we present the first publicly available full-duplex spoken dialogue model in Japanese, which is built upon Moshi, a full-duplex dialogue model in English. Our model is trained through a two-stage process: pre-training on a large-scale spoken dialogue data in Japanese, followed by fine-tuning on high-quality stereo spoken dialogue data. We further enhance the model's performance by incorporating synthetic dialogue data generated by a multi-stream text-to-speech system. Evaluation experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Multi-Agent Systems and Negotiation

MethodsSoftmax · Attention Is All You Need