MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

Szu-Chi Chen; I-Ning Tsai; Yi-Cheng Lin; Sung-Feng Huang; Hung-yi Lee

arXiv:2604.17435·cs.CL·April 21, 2026

MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation

Szu-Chi Chen, I-Ning Tsai, Yi-Cheng Lin, Sung-Feng Huang, Hung-yi Lee

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces MoVE, a novel speech-to-speech translation system that effectively preserves non-verbal vocalizations like laughter and crying, enhancing emotional and pragmatic communication in translated speech.

Contribution

MoVE employs a Mixture-of-LoRA-Experts architecture with specialized adapters and a soft-weighting router, enabling efficient and expressive preservation of non-verbal vocalizations in S2ST.

Findings

01

MoVE reproduces target NVs in 76% of cases.

02

Achieves highest human-rated naturalness and emotional fidelity among compared systems.

03

Requires only 30 minutes of curated data for strong performance.

Abstract

Recent Speech-to-Speech Translation (S2ST) systems achieve strong semantic accuracy yet consistently strip away non-verbal vocalizations (NVs), such as laughter and crying that convey pragmatic intent, which severely limits real-world utility. We address this via three contributions. First, we propose a synthesis pipeline for building scalable expressive datasets to overcome the data scarcity limitation. Second, we propose MoVE, a Mixture-of-LoRA-Experts architecture with expressive-specialized adapters and a soft-weighting router that blends experts for capturing hybrid expressive states. Third, we show pretrained AudioLLMs enable striking data efficiency: 30 minutes of curated data is enough for strong performance. On English-Chinese S2ST, while comparing with strong baselines, MoVE reproduces target NVs in 76% of cases and achieves the highest human-rated naturalness and emotional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

47zzz/MoVE
github

Datasets

47z/MoVE
dataset· 163 dl
163 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.