Audio Decoding by Inverse Problem Solving

Pedro J. Villasana T.; Lars Villemoes; Janusz Klejsa; Per Hedelin

arXiv:2409.07858·eess.AS·September 13, 2024

Audio Decoding by Inverse Problem Solving

Pedro J. Villasana T., Lars Villemoes, Janusz Klejsa, Per Hedelin

PDF

Open Access

TL;DR

This paper introduces a novel audio decoding approach using inverse problem solving with diffusion posterior sampling, demonstrating improved performance across various audio content types and bitrates.

Contribution

It develops explicit conditioning functions for transform domain perceptual codecs and introduces a more efficient diffusion sampling method, enhancing audio decoding quality.

Findings

01

Significant improvements in piano and speech decoding performance.

02

Enhanced decoding across diverse music content and bitrates.

03

Reduced gradient evaluations with the noisy mean model.

Abstract

We consider audio decoding as an inverse problem and solve it through diffusion posterior sampling. Explicit conditioning functions are developed for input signal measurements provided by an example of a transform domain perceptual audio codec. Viability is demonstrated by evaluating arbitrary pairings of a set of bitrates and task-agnostic prior models. For instance, we observe significant improvements on piano while maintaining speech performance when a speech model is replaced by a joint model trained on both speech and piano. With a more general music model, improved decoding compared to legacy methods is obtained for a broad range of content types and bitrates. The noisy mean model, underlying the proposed derivation of conditioning, enables a significant reduction of gradient evaluations for diffusion posterior sampling, compared to methods based on Tweedie's mean. Combining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing

MethodsSparse Evolutionary Training · Diffusion