Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis
Zhenzi Weng, Zhijin Qin, Xiaoming Tao, Chengkang Pan, Guangyi Liu, and, Geoffrey Ye Li

TL;DR
This paper introduces DeepSC-ST, a deep learning-based semantic communication system for speech that efficiently transmits and reconstructs speech signals, outperforming traditional methods especially in noisy environments.
Contribution
The paper presents a novel deep learning framework for semantic speech communication, integrating recognition and synthesis, with adaptive robustness to channel variations.
Findings
Significant data transmission reduction without performance loss
Outperforms conventional and existing DL communication systems in low SNR
Demonstrated effectiveness through simulation and software prototype
Abstract
In this paper, we develop a deep learning based semantic communication system for speech transmission, named DeepSC-ST. We take the speech recognition and speech synthesis as the transmission tasks of the communication system, respectively. First, the speech recognition-related semantic features are extracted for transmission by a joint semantic-channel encoder and the text is recovered at the receiver based on the received semantic features, which significantly reduces the required amount of data transmission without performance degradation. Then, we perform speech synthesis at the receiver, which dedicates to re-generate the speech signals by feeding the recognized text and the speaker information into a neural network module. To enable the DeepSC-ST adaptive to dynamic channel environments, we identify a robust model to cope with different channel conditions. According to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
