MFCC-GAN Codec: A New AI-based Audio Coding
Mohammad Reza Hasanabadi

TL;DR
This paper introduces MFCC-GAN Codec, an AI-based audio coding method that uses adversarial learning with MFCC features to achieve high-quality audio reconstruction at lower bitrates compared to traditional codecs.
Contribution
The paper presents a novel GAN-based audio codec leveraging MFCC features, achieving state-of-the-art SNR and perceptual quality at significantly reduced bitrates.
Findings
MFCCGAN_36k outperforms traditional codecs in SNR at lower bitrates.
MFCCGAN_13k achieves comparable SNR to high-bitrate codecs with much lower bitrate.
MFCCGAN models yield higher NISQA-MOS scores than traditional codecs at reduced bitrates.
Abstract
In this paper, we proposed AI-based audio coding using MFCC features in an adversarial setting. We combined a conventional encoder with an adversarial learning decoder to better reconstruct the original waveform. Since GAN gives implicit density estimation, therefore, such models are less prone to overfitting. We compared our work with five well-known codecs namely AAC, AC3, Opus, Vorbis, and Speex, performing on bitrates from 2kbps to 128kbps. MFCCGAN_36k achieved the state-of-the-art result in terms of SNR despite a lower bitrate in comparison to AC3_128k, AAC_112k, Vorbis_48k, Opus_48k, and Speex_48K. On the other hand, MFCCGAN_13k also achieved high SNR=27 which is equal to that of AC3_128k, and AAC_112k while having a significantly lower bitrate (13 kbps). MFCCGAN_36k achieved higher NISQA-MOS results compared to AAC_48k while having a 20% lower bitrate. Furthermore, MFCCGAN_13k…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Image and Signal Denoising Methods · Speech and Audio Processing
