# Rate–Distortion–Perception Optimized Neural Speech Transmission System for High-Fidelity Semantic Communications

**Authors:** Shengshi Yao, Zixuan Xiao, Kai Niu

PMC · DOI: 10.3390/s24103169 · 2024-05-16

## TL;DR

This paper introduces a new neural speech transmission system that improves high-fidelity communication by adapting to speech content and channel conditions.

## Contribution

The novel NST framework optimizes rate–distortion–perception performance through adaptive transmission and causal coding for real-time use.

## Key findings

- NST outperforms existing methods in rate–distortion–perception performance.
- Streaming NST achieves low-latency transmission with minimal quality loss.
- Adaptive coding based on semantic content improves transmission efficiency.

## Abstract

We consider the problem of learned speech transmission. Existing methods have exploited joint source–channel coding (JSCC) to encode speech directly to transmitted symbols to improve the robustness over noisy channels. However, the fundamental limit of these methods is the failure of identification of content diversity across speech frames, leading to inefficient transmission. In this paper, we propose a novel neural speech transmission framework named NST. It can be optimized for superior rate–distortion–perception (RDP) performance toward the goal of high-fidelity semantic communication. Particularly, a learned entropy model assesses latent speech features to quantify the semantic content complexity, which facilitates the adaptive transmission rate allocation. NST enables a seamless integration of the source content with channel state information through variable-length joint source–channel coding, which maximizes the coding gain. Furthermore, we present a streaming variant of NST, which adopts causal coding based on sliding windows. Experimental results verify that NST outperforms existing speech transmission methods including separation-based and JSCC solutions in terms of RDP performance. Streaming NST achieves low-latency transmission with a slight quality degradation, which is tailored for real-time speech communication.

## Full-text entities

- **Genes:** SULT4A1 (sulfotransferase family 4A member 1) [NCBI Gene 25830] {aka BR-STL-1, BRSTL1, DJ388M5.3, NST, SULTX3, hBR-STL-1}
- **Diseases:** RTC (MESH:D003147), injury to people or property (MESH:C000719191)
- **Chemicals:** COST2100 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11124825/full.md

---
Source: https://tomesphere.com/paper/PMC11124825