Deep Learning Enabled Semantic Communications with Speech Recognition   and Synthesis

Zhenzi Weng; Zhijin Qin; Xiaoming Tao; Chengkang Pan; Guangyi Liu; and; Geoffrey Ye Li

arXiv:2205.04603·eess.AS·April 3, 2023

Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis

Zhenzi Weng, Zhijin Qin, Xiaoming Tao, Chengkang Pan, Guangyi Liu, and, Geoffrey Ye Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces DeepSC-ST, a deep learning-based semantic communication system for speech that efficiently transmits and reconstructs speech signals, outperforming traditional methods especially in noisy environments.

Contribution

The paper presents a novel deep learning framework for semantic speech communication, integrating recognition and synthesis, with adaptive robustness to channel variations.

Findings

01

Significant data transmission reduction without performance loss

02

Outperforms conventional and existing DL communication systems in low SNR

03

Demonstrated effectiveness through simulation and software prototype

Abstract

In this paper, we develop a deep learning based semantic communication system for speech transmission, named DeepSC-ST. We take the speech recognition and speech synthesis as the transmission tasks of the communication system, respectively. First, the speech recognition-related semantic features are extracted for transmission by a joint semantic-channel encoder and the text is recovered at the receiver based on the received semantic features, which significantly reduces the required amount of data transmission without performance degradation. Then, we perform speech synthesis at the receiver, which dedicates to re-generate the speech signals by feeding the recognized text and the speaker information into a neural network module. To enable the DeepSC-ST adaptive to dynamic channel environments, we identify a robust model to cope with different channel conditions. According to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenzi-weng/deepsc-st_demonstration
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques