Semantic-preserved Communication System for Highly Efficient Speech   Transmission

Tianxiao Han; Qianqian Yang; Zhiguo Shi; Shibo He; Zhaoyang Zhang

arXiv:2205.12727·eess.AS·May 26, 2022·6 cites

Semantic-preserved Communication System for Highly Efficient Speech Transmission

Tianxiao Han, Qianqian Yang, Zhiguo Shi, Shibo He, Zhaoyang Zhang

PDF

Open Access

TL;DR

This paper introduces a deep learning-based semantic communication system for speech that transmits only the essential semantic information, significantly reducing transmission data while maintaining high accuracy in speech recognition and quality.

Contribution

The paper proposes a novel end-to-end semantic-oriented speech transmission method that efficiently encodes semantic information and improves transmission efficiency for speech recognition and reconstruction.

Findings

01

Outperforms existing methods in speech-to-text accuracy and speech quality.

02

Uses only 16% of symbols compared to traditional methods for speech-to-text.

03

Achieves 10% reduction in WER with significantly less data transmitted.

Abstract

Deep learning (DL) based semantic communication methods have been explored for the efficient transmission of images, text, and speech in recent years. In contrast to traditional wireless communication methods that focus on the transmission of abstract symbols, semantic communication approaches attempt to achieve better transmission efficiency by only sending the semantic-related information of the source data. In this paper, we consider semantic-oriented speech transmission which transmits only the semantic-relevant information over the channel for the speech recognition task, and a compact additional set of semantic-irrelevant information for the speech reconstruction task. We propose a novel end-to-end DL-based transceiver which extracts and encodes the semantic information from the input speech spectrums at the transmitter and outputs the corresponding transcriptions from the decoded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques