A2SB: Audio-to-Audio Schrodinger Bridges
Zhifeng Kong, Kevin J Shih, Weili Nie, Arash Vahdat, Sang-gil Lee, Joao Felipe Santos, Ante Jukic, Rafael Valle, Bryan Catanzaro

TL;DR
A2SB is an end-to-end audio restoration model that enhances high-res music by extending bandwidth and inpainting missing segments without vocoders, achieving state-of-the-art results on various music datasets.
Contribution
Introduces A2SB, a novel end-to-end audio-to-audio Schrödinger Bridges model for high-quality music restoration, capable of bandwidth extension and inpainting without vocoders.
Findings
Achieves state-of-the-art quality in bandwidth extension.
Effectively inpaints missing audio segments.
Operates on hour-long music inputs.
Abstract
Real-world audio is often degraded by numerous factors. This work presents an audio restoration model tailored for high-res music at 44.1kHz. Our model, Audio-to-Audio Schr\"odinger Bridges (A2SB), is capable of both bandwidth extension (predicting high-frequency components) and inpainting (re-generating missing segments). Critically, A2SB is end-to-end requiring no vocoder to predict waveform outputs, able to restore hour-long audio inputs, and trained on permissively licensed music data. A2SB is capable of achieving state-of-the-art band-width extension and inpainting quality on several out-of-distribution music test sets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies
MethodsInpainting
