PhoenixCodec: Taming Neural Speech Coding for Extreme Low-Resource Scenarios

Zixiang Wan; Haoran Zhao; Guochang Zhang; Runqiang Han; Jianqiang Wei; Yuexian Zou

arXiv:2510.21196·eess.AS·February 24, 2026

PhoenixCodec: Taming Neural Speech Coding for Extreme Low-Resource Scenarios

Zixiang Wan, Haoran Zhao, Guochang Zhang, Runqiang Han, Jianqiang Wei, Yuexian Zou

PDF

Open Access

TL;DR

PhoenixCodec is a neural speech coding framework optimized for extremely low-resource scenarios, achieving high efficiency and robustness with innovative training and architecture strategies.

Contribution

It introduces an integrated low-resource neural speech coding system with CCR training, asymmetric architecture, and noise-invariant fine-tuning, outperforming existing methods.

Findings

01

Ranked third in LRAC 2025 Challenge Track 1

02

Achieved best performance at 1 kbps in noisy and reverberant conditions

03

Demonstrated high intelligibility in clean tests

Abstract

This paper presents PhoenixCodec, a comprehensive neural speech coding and decoding framework designed for extremely low-resource conditions. The proposed system integrates an optimized asymmetric frequency-time architecture, a Cyclical Calibration and Refinement (CCR) training strategy, and a noise-invariant fine-tuning procedure. Under stringent constraints - computation below 700 MFLOPs, latency less than 30 ms, and dual-rate support at 1 kbps and 6 kbps - existing methods face a trade-off between efficiency and quality. PhoenixCodec addresses these challenges by alleviating the resource scattering of conventional decoders, employing CCR to enhance optimization stability, and enhancing robustness through noisy-sample fine-tuning. In the LRAC 2025 Challenge Track 1, the proposed system ranked third overall and demonstrated the best performance at 1 kbps in both real-world noise and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques