Fast and Flexible Audio Bandwidth Extension via Vocos

Yatharth Sharma

arXiv:2603.07285·eess.AS·March 10, 2026

Fast and Flexible Audio Bandwidth Extension via Vocos

Yatharth Sharma

PDF

Open Access

TL;DR

This paper introduces a neural vocoder-based bandwidth extension model that efficiently enhances audio quality across a wide frequency range, supporting arbitrary upsampling ratios with real-time performance.

Contribution

It presents a novel Vocos-based model that combines neural vocoding with a lightweight refiner for flexible, high-quality audio bandwidth extension at high speeds.

Findings

01

Achieves competitive spectral distance metrics.

02

Operates in real-time on high-end GPUs.

03

Supports arbitrary upsampling ratios.

Abstract

We propose a Vocos-based bandwidth extension model that enhances audio at 8-48 kHz by generating missing high-frequency content. Inputs are resampled to 48 kHz and processed by a neural vocoder backbone, enabling a single network to support arbitrary upsampling ratios. A lightweight Linkwitz-Riley-inspired refiner merges the original low band with the generated high frequencies via a smooth crossover. On validation, the model achieves competitive log-spectral distance while running at a real-time factor of 0.0001 on an NVIDIA A100 GPU and 0.0053 on an 8-core CPU, demonstrating practical, high-quality BWE at extreme throughput.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation