A Survey of Advancing Audio Super-Resolution and Bandwidth Extension from Discriminative to Generative Models
Ningyuan Yang, Yize Li, Diego A. Cuji, Ryan M. Corey, Pu Zhao, Xue Lin, Andrew C. Singer

TL;DR
This survey reviews the evolution of audio super-resolution from traditional discriminative models to modern generative approaches, highlighting key techniques, challenges, and future directions.
Contribution
It provides a comprehensive taxonomy and unified perspective on generative models for audio bandwidth extension and super-resolution, guiding future research.
Findings
Generative models improve perceptual quality over discriminative methods.
Key design trade-offs include fidelity, robustness, and computational efficiency.
Emerging directions involve large language models and multimodal foundation models.
Abstract
Audio super-resolution (SR), also referred to as bandwidth extension (BWE), aims to reconstruct high-fidelity signals from low-resolution (LR) or band-limited (BL) observations, an inherently ill-posed task due to the ambiguity of missing high-frequency (HF) content. This survey provides a comprehensive overview of the field, with a particular focus on the paradigm shift from discriminative mapping to modern generative modeling. We first review early discriminative deep neural network (DNN) models, which formulate BWE/SR as a deterministic mapping problem and are prone to regression-to-the-mean effects and spectral over-smoothing. We then systematically review generative approaches, including autoregressive (AR) models, variational autoencoders (VAEs), generative adversarial networks (GANs), diffusion and score-based models, flow-based methods, and Schr\"odinger bridges. Across these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
