Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification

Ysobel Sims; Alexandre Mendes; Stephan Chalup

arXiv:2412.03771·cs.SD·July 3, 2025

Embedding-Space Diffusion for Zero-Shot Environmental Sound Classification

Ysobel Sims, Alexandre Mendes, Stephan Chalup

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel diffusion model for zero-shot environmental sound classification, demonstrating superior performance over existing generative methods across multiple audio datasets.

Contribution

It adapts and evaluates a diffusion model for zero-shot learning in environmental audio, establishing a new benchmark and advancing generative approaches in this domain.

Findings

01

Diffusion model outperforms baselines on six audio datasets.

02

Generative methods improve zero-shot environmental sound classification.

03

First benchmark of generative methods in this field.

Abstract

Zero-shot learning enables models to generalise to unseen classes by leveraging semantic information, bridging the gap between training and testing sets with non-overlapping classes. While much research has focused on zero-shot learning in computer vision, the application of these methods to environmental audio remains underexplored, with poor performance in existing studies. Generative methods, which have demonstrated success in computer vision, are notably absent from zero-shot environmental sound classification studies. To address this gap, this work investigates generative methods for zero-shot learning in environmental audio. Two successful generative models from computer vision are adapted: a cross-aligned and distribution-aligned variational autoencoder (CADA-VAE) and a leveraging invariant side generative adversarial network (LisGAN). Additionally, we introduced a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ysims/zerodiffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Ultrasonics and Acoustic Wave Propagation

MethodsDiffusion