MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios
Xiao-Hang Jiang, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Ye-Xin Lu,, Zhen-Hua Ling

TL;DR
MDCTCodec is a lightweight neural audio codec utilizing MDCT for efficient high-quality audio compression at high sampling rates and low bitrates, with novel adversarial training and compact design.
Contribution
It introduces a new MDCT-based neural codec with a multi-resolution discriminator, achieving high-quality audio at low bitrates and high sampling rates with improved efficiency.
Findings
Achieved ViSQOL score of 4.18 at 48 kHz and 6 kbps.
Demonstrated superior performance and efficiency over baseline codecs.
Produced high-quality audio reconstruction in high sampling rate scenarios.
Abstract
In this paper, we propose MDCTCodec, an efficient lightweight end-to-end neural audio codec based on the modified discrete cosine transform (MDCT). The encoder takes the MDCT spectrum of audio as input, encoding it into a continuous latent code which is then discretized by a residual vector quantizer (RVQ). Subsequently, the decoder decodes the MDCT spectrum from the quantized latent code and reconstructs audio via inverse MDCT. During the training phase, a novel multi-resolution MDCT-based discriminator (MR-MDCTD) is adopted to discriminate the natural or decoded MDCT spectrum for adversarial training. Experimental results confirm that, in scenarios with high sampling rates and low bitrates, the MDCTCodec exhibited high decoded audio quality, improved training and generation efficiency, and compact model size compared to baseline codecs. Specifically, the MDCTCodec achieved a ViSQOL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Advanced Data Compression Techniques
