SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional   Masked Language Modeling

Shengshi Yao; Jincheng Dai; Xiaoqi Qin; Sixian Wang; Siye Wang; Kai; Niu; Ping Zhang

arXiv:2501.12696·eess.AS·January 23, 2025

SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling

Shengshi Yao, Jincheng Dai, Xiaoqi Qin, Sixian Wang, Siye Wang, Kai, Niu, Ping Zhang

PDF

TL;DR

SoundSpring is a novel error-resilient audio transceiver that leverages large language models for efficient compression and packet loss concealment, outperforming existing systems in fidelity and perceptual quality.

Contribution

It introduces a dual-functional model combining audio compression and error concealment using foundation language models, a novel approach in audio communication.

Findings

01

Outperforms existing systems in fidelity metrics

02

Enhances perceptual quality of transmitted audio

03

Demonstrates effectiveness of language models in error resilience

Abstract

In this paper, we propose "SoundSpring", a cutting-edge error-resilient audio transceiver that marries the robustness benefits of joint source-channel coding (JSCC) while also being compatible with current digital communication systems. Unlike recent deep JSCC transceivers, which learn to directly map audio signals to analog channel-input symbols via neural networks, our SoundSpring adopts the layered architecture that delineates audio compression from digital coded transmission, but it sufficiently exploits the impressive in-context predictive capabilities of large language (foundation) models. Integrated with the casual-order mask learning strategy, our single model operates on the latent feature domain and serve dual-functionalities: as efficient audio compressors at the transmitter and as effective mechanisms for packet loss concealment at the receiver. By jointly optimizing towards…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.