TADA! Tuning Audio Diffusion Models through Activation Steering

{\L}ukasz Staniszewski; Katarzyna Zaleska; Mateusz Modrzejewski; Kamil Deja

arXiv:2602.11910·cs.SD·May 20, 2026

TADA! Tuning Audio Diffusion Models through Activation Steering

{\L}ukasz Staniszewski, Katarzyna Zaleska, Mateusz Modrzejewski, Kamil Deja

PDF

1 Repo 50 Models 1 Datasets

TL;DR

This paper reveals a semantic bottleneck in audio diffusion models and introduces activation steering as a novel, effective method for fine-grained musical attribute control, outperforming existing techniques.

Contribution

The study uncovers a shared attention layer bottleneck and proposes activation steering, establishing a new state-of-the-art in audio concept modulation.

Findings

01

Activation patching reveals a semantic bottleneck in attention layers.

02

Activation steering outperforms prompt, score-space, and weight-space methods.

03

Extensive user study supports the effectiveness of activation steering.

Abstract

Audio diffusion models can synthesize high-fidelity music from text, yet achieving fine-grained control over specific musical attributes remains challenging, as their internal mechanisms for representing high-level concepts are poorly understood. In this work, we use activation patching to demonstrate that recent audio diffusion architectures exhibit a semantic bottleneck, where a small, shared subset of consecutive attention layers controls distinct musical concepts, such as the presence of specific instruments, vocals, or genres. Building on this, we systematically evaluate a broad spectrum of steering paradigms, comparing activation steering against prompt-level, score-space, and weight-space interventions, analyzing the interaction between the steering mechanism and the intervention site. Our new benchmark, supported by an extensive user study, demonstrates that localized activation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luk-st/steer-audio
github

Models

Datasets

lukasz-staniszewski/patching-music-musiccaps-prompts
dataset· 71 dl
71 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception