FreeSliders: Training-Free, Modality-Agnostic Concept Sliders for Fine-Grained Diffusion Control in Images, Audio, and Video

Rotem Ezra; Hedi Zisling; Nimrod Berman; Ilan Naiman; Alexey Gorkor; Liran Nochumsohn; Eliya Nachmani; Omri Azencot

arXiv:2511.00103·cs.CV·November 4, 2025

FreeSliders: Training-Free, Modality-Agnostic Concept Sliders for Fine-Grained Diffusion Control in Images, Audio, and Video

Rotem Ezra, Hedi Zisling, Nimrod Berman, Ilan Naiman, Alexey Gorkor, Liran Nochumsohn, Eliya Nachmani, Omri Azencot

PDF

Open Access

TL;DR

FreeSliders introduces a training-free, modality-agnostic method for fine-grained concept control in diffusion models across images, audio, and video, enhancing controllability without additional training.

Contribution

It proposes a novel, training-free approach for concept sliders that works across multiple modalities, extending the evaluation benchmark and addressing scale and traversal issues.

Findings

01

Enables plug-and-play concept control without training.

02

Outperforms existing baselines in multi-modal fine-grained control.

03

Provides new evaluation metrics and tools for controllable generation.

Abstract

Diffusion models have become state-of-the-art generative models for images, audio, and video, yet enabling fine-grained controllable generation, i.e., continuously steering specific concepts without disturbing unrelated content, remains challenging. Concept Sliders (CS) offer a promising direction by discovering semantic directions through textual contrasts, but they require per-concept training and architecture-specific fine-tuning (e.g., LoRA), limiting scalability to new modalities. In this work we introduce FreeSliders, a simple yet effective approach that is fully training-free and modality-agnostic, achieved by partially estimating the CS formula during inference. To support modality-agnostic evaluation, we extend the CS benchmark to include both video and audio, establishing the first suite for fine-grained concept generation control with multiple modalities. We further propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning