A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain
Kishan Gupta, Srikanth Korse, Bernd Edler, Guillaume Fuchs

TL;DR
This paper introduces a real-time, mask-based post-filter using a lightweight neural network to enhance low-bitrate MDCT coded speech without adding delay, significantly improving audio quality.
Contribution
It presents a novel MDCT domain post-filter with a neural network that improves speech quality at low bitrates without increasing delay or complexity.
Findings
Achieves 10 MUSHRA points improvement over standard LC3 codec.
Operates directly in MDCT domain without extra delay.
Uses a lightweight convolutional encoder-decoder network.
Abstract
Frequency domain processing, and in particular the use of Modified Discrete Cosine Transform (MDCT), is the most widespread approach to audio coding. However, at low bitrates, audio quality, especially for speech, degrades drastically due to the lack of available bits to directly code the transform coefficients. Traditionally, post-filtering has been used to mitigate artefacts in the coded speech by exploiting a-priori information of the source and extra transmitted parameters. Recently, data-driven post-filters have shown better results, but at the cost of significant additional complexity and delay. In this work, we propose a mask-based post-filter operating directly in MDCT domain of the codec, inducing no extra delay. The real-valued mask is applied to the quantized MDCT coefficients and is estimated from a relatively lightweight convolutional encoder-decoder network. Our solution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Advanced Adaptive Filtering Techniques
MethodsDiscrete Cosine Transform
