Large Model Empowered Streaming Speech Semantic Communications

Zhenzi Weng; Zhijin Qin; and Geoffrey Ye Li

arXiv:2501.05859·eess.AS·February 24, 2025·IEEE Wirel. Commun. Lett.

Large Model Empowered Streaming Speech Semantic Communications

Zhenzi Weng, Zhijin Qin, and Geoffrey Ye Li

PDF

Open Access 1 Repo

TL;DR

This paper presents LSSC-ST, a streaming semantic speech communication system that uses large models and edge collaboration to enable multilingual, low-latency speech transmission with improved accuracy.

Contribution

The paper introduces a novel edge-device collaborative architecture with dynamic speech segmentation for low-latency, multilingual streaming speech communication using large pre-trained models.

Findings

01

Lower transmission latency compared to non-streaming systems

02

More accurate speech transmission in multilingual scenarios

03

Effective adaptive speech segmentation reduces latency

Abstract

In this paper, we introduce a large model-empowered streaming semantic communication system for speech transmission across various languages, named LSSC-ST. Specifically, we devise an edge-device collaborative semantic communication architecture by offloading the intricate semantic extraction and channel coding modules to edge servers, thereby reducing the computational burden on local devices. To support multilingual speech transmission, pre-trained large speech models are utilized to learn unified semantic features from speech in different languages, breaking the constraint of a single input language and enhancing the practicality of the LSSC-ST. Moreover, the input speech is sequentially streamed into the developed system as short speech segments, which enables low transmission latency without degrading the quality of the produced speech. A novel dynamic speech segmentation algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Zhenzi-Weng/LaSC-ST
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis