OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

Chen Wang; Tianyu Peng; Wen Yang; Yinan Bai; Guangfu Wang; Jun Lin; Lanpeng Jia; Lingxiang Wu; Jinqiao Wang; Chengqing Zong; Jiajun Zhang

arXiv:2507.05177·cs.CL·October 28, 2025

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

Chen Wang, Tianyu Peng, Wen Yang, Yinan Bai, Guangfu Wang, Jun Lin, Lanpeng Jia, Lingxiang Wu, Jinqiao Wang, Chengqing Zong, Jiajun Zhang

PDF

1 Video

TL;DR

OpenS2S is a fully open-source, end-to-end empathetic speech language model designed for transparent research and low-latency, emotionally expressive speech interactions, utilizing automated data synthesis and scalable training methods.

Contribution

It introduces a novel open-source empathetic speech model with an automated data pipeline and streaming decoding architecture for low-latency, expressive speech generation.

Findings

01

Achieved low-latency speech generation with streaming interleaved decoding.

02

Constructed a diverse, high-quality empathetic speech dataset with minimal supervision.

03

Released comprehensive open-source resources including model, dataset, and code.

Abstract

Empathetic interaction is a cornerstone of human-machine communication, due to the need for understanding speech enriched with paralinguistic cues and generating emotional and expressive responses. However, the most powerful empathetic LSLMs are increasingly closed off, leaving the crucial details about the architecture, data and development opaque to researchers. Given the critical need for transparent research into the LSLMs and empathetic behavior, we present OpenS2S, a fully open-source, transparent and end-to-end LSLM designed to enable empathetic speech interactions. Based on our empathetic speech-to-text model BLSP-Emo, OpenS2S further employs a streaming interleaved decoding architecture to achieve low-latency speech generation. To facilitate end-to-end training, OpenS2S incorporates an automated data construction pipeline that synthesizes diverse, high-quality empathetic speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model· underline