Hello-Chat: Towards Realistic Social Audio Interactions

Yueran Hou; Peilei Jia; Zihan Sun; Qihang Lu; Wenbing Yang; Yingming Gao; Ya Li; Jun Gao

arXiv:2602.23387·cs.SD·March 2, 2026

Hello-Chat: Towards Realistic Social Audio Interactions

Yueran Hou, Peilei Jia, Zihan Sun, Qihang Lu, Wenbing Yang, Yingming Gao, Ya Li, Jun Gao

PDF

Open Access 1 Models

TL;DR

Hello-Chat is an end-to-end audio language model that enhances social audio interactions by improving naturalness, emotional resonance, and anthropomorphic qualities through a large dataset and novel training strategies.

Contribution

The paper introduces Hello-Chat, a new model that advances social audio interaction realism by integrating a large dataset and modality-interleaved training for more natural and empathetic responses.

Findings

01

Achieves state-of-the-art performance on audio understanding tasks.

02

Outperforms baselines in prosodic naturalness.

03

Enhances emotional alignment in social audio interactions.

Abstract

Recent advancements in Large Audio Language Models (LALMs) have demonstrated exceptional performance in speech recognition and translation. However, existing models often suffer from a disconnect between perception and expression, resulting in a robotic "read-speech" style that lacks the spontaneity and emotional resonance of real human interaction. In this report, we introduce Hello-Chat, an end-to-end audio language model designed for realistic social scenarios. By leveraging a massive dataset of real-life conversations and employing a modality-interleaved training strategy, Hello-Chat achieves a breakthrough in anthropomorphic generation. Experimental results show that our model not only reaches state-of-the-art (SOTA) performance on specific audio understanding tasks but also significantly outperforms existing baselines in prosodic naturalness and emotional alignment, paving the way…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
hellogroup-opensource/Hello-Chat
model· ♡ 3
♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · AI in Service Interactions · Emotion and Mood Recognition