LatentFlowSR: High-Fidelity Audio Super-Resolution via Noise-Robust Latent Flow Matching
Fei Liu, Yang Ai, Hui-Peng Du, Yu-Fei Shi, Zhen-Hua Ling

TL;DR
LatentFlowSR introduces a novel latent space approach for high-fidelity audio super-resolution, leveraging conditional flow matching and a noise-robust autoencoder to enhance reconstruction quality across diverse audio types.
Contribution
The paper proposes a new latent-space super-resolution method using conditional flow matching and a noise-robust autoencoder, improving performance on complex audio beyond speech.
Findings
Outperforms baseline methods on various audio types and settings.
Demonstrates strong high-frequency reconstruction and generalization.
Utilizes a one-step ODE solver for efficient latent space generation.
Abstract
Audio super-resolution aims to recover missing high-frequency details from bandwidth-limited low-resolution audio, thereby improving the naturalness and perceptual quality of the reconstructed signal. However, most existing methods directly operate in the waveform or time-frequency domain, which not only involves high-dimensional generation spaces but is also largely limited to speech tasks, leaving substantial room for improvement on more complex audio types such as sound effects and music. To mitigate these limitations, we introduce LatentFlowSR, a new audio super-resolution approach that leverages conditional flow matching (CFM) within a latent representation space. Specifically, we first train a noise-robust autoencoder, which encodes low-resolution audio into a continuous latent space. Conditioned on the low-resolution latent representation, a CFM mechanism progressively generates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
