Audio Decoding by Inverse Problem Solving
Pedro J. Villasana T., Lars Villemoes, Janusz Klejsa, Per Hedelin

TL;DR
This paper introduces a novel audio decoding approach using inverse problem solving with diffusion posterior sampling, demonstrating improved performance across various audio content types and bitrates.
Contribution
It develops explicit conditioning functions for transform domain perceptual codecs and introduces a more efficient diffusion sampling method, enhancing audio decoding quality.
Findings
Significant improvements in piano and speech decoding performance.
Enhanced decoding across diverse music content and bitrates.
Reduced gradient evaluations with the noisy mean model.
Abstract
We consider audio decoding as an inverse problem and solve it through diffusion posterior sampling. Explicit conditioning functions are developed for input signal measurements provided by an example of a transform domain perceptual audio codec. Viability is demonstrated by evaluating arbitrary pairings of a set of bitrates and task-agnostic prior models. For instance, we observe significant improvements on piano while maintaining speech performance when a speech model is replaced by a joint model trained on both speech and piano. With a more general music model, improved decoding compared to legacy methods is obtained for a broad range of content types and bitrates. The noisy mean model, underlying the proposed derivation of conditioning, enables a significant reduction of gradient evaluations for diffusion posterior sampling, compared to methods based on Tweedie's mean. Combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing
MethodsSparse Evolutionary Training · Diffusion
