Rate–Distortion–Perception Optimized Neural Speech Transmission System for High-Fidelity Semantic Communications

Shengshi Yao; Zixuan Xiao; Kai Niu

PMC · DOI:10.3390/s24103169·May 16, 2024

Rate–Distortion–Perception Optimized Neural Speech Transmission System for High-Fidelity Semantic Communications

Shengshi Yao, Zixuan Xiao, Kai Niu

PDF

Open Access

TL;DR

This paper introduces a new neural speech transmission system that improves high-fidelity communication by adapting to speech content and channel conditions.

Contribution

The novel NST framework optimizes rate–distortion–perception performance through adaptive transmission and causal coding for real-time use.

Findings

01

NST outperforms existing methods in rate–distortion–perception performance.

02

Streaming NST achieves low-latency transmission with minimal quality loss.

03

Adaptive coding based on semantic content improves transmission efficiency.

Abstract

We consider the problem of learned speech transmission. Existing methods have exploited joint source–channel coding (JSCC) to encode speech directly to transmitted symbols to improve the robustness over noisy channels. However, the fundamental limit of these methods is the failure of identification of content diversity across speech frames, leading to inefficient transmission. In this paper, we propose a novel neural speech transmission framework named NST. It can be optimized for superior rate–distortion–perception (RDP) performance toward the goal of high-fidelity semantic communication. Particularly, a learned entropy model assesses latent speech features to quantify the semantic content complexity, which facilitates the adaptive transmission rate allocation. NST enables a seamless integration of the source content with channel state information through variable-length joint…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

SULT4A1

Proteins1

Species1

Homo sapiens(human · species)

Chemicals1

COST2100

Diseases2

RTC injury to people or property

Figures10

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing