TADA! Tuning Audio Diffusion Models through Activation Steering
{\L}ukasz Staniszewski, Katarzyna Zaleska, Mateusz Modrzejewski, Kamil Deja

TL;DR
This paper reveals a semantic bottleneck in audio diffusion models and introduces activation steering as a novel, effective method for fine-grained musical attribute control, outperforming existing techniques.
Contribution
The study uncovers a shared attention layer bottleneck and proposes activation steering, establishing a new state-of-the-art in audio concept modulation.
Findings
Activation patching reveals a semantic bottleneck in attention layers.
Activation steering outperforms prompt, score-space, and weight-space methods.
Extensive user study supports the effectiveness of activation steering.
Abstract
Audio diffusion models can synthesize high-fidelity music from text, yet achieving fine-grained control over specific musical attributes remains challenging, as their internal mechanisms for representing high-level concepts are poorly understood. In this work, we use activation patching to demonstrate that recent audio diffusion architectures exhibit a semantic bottleneck, where a small, shared subset of consecutive attention layers controls distinct musical concepts, such as the presence of specific instruments, vocals, or genres. Building on this, we systematically evaluate a broad spectrum of steering paradigms, comparing activation steering against prompt-level, score-space, and weight-space interventions, analyzing the interaction between the steering mechanism and the intervention site. Our new benchmark, supported by an extensive user study, demonstrates that localized activation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗lukasz-staniszewski/ace-step-caa-pianomodel· 47 dl47 dl
- 🤗lukasz-staniszewski/ace-step-caa-guitar-electronicmodel· 64 dl64 dl
- 🤗lukasz-staniszewski/ace-step-caa-vocal-gendermodel· 65 dl65 dl
- 🤗lukasz-staniszewski/ace-step-caa-electronic-musicmodel· 64 dl64 dl
- 🤗lukasz-staniszewski/ace-step-caa-rock-genremodel· 63 dl63 dl
- 🤗lukasz-staniszewski/ace-step-caa-moodmodel· 35 dl35 dl
- 🤗lukasz-staniszewski/ace-step-caa-vocal-stylemodel· 69 dl69 dl
- 🤗lukasz-staniszewski/ace-step-caa-tempomodel· 48 dl48 dl
- 🤗lukasz-staniszewski/ace-step-caa-violinmodel· 44 dl44 dl
- 🤗lukasz-staniszewski/ace-step-sae-tf7-cross-attnmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
