Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models
Conghan Yue, Zhengwei Peng, Shiyan Du, Zhi Ji, Chuangjian Cai, Le Wan, Dongyu Zhang

TL;DR
This paper presents AMDM, a training-free algorithm that combines multiple diffusion models in the latent space to achieve fine-grained control over generated content, bypassing extensive training and dataset limitations.
Contribution
It introduces a novel, training-free method to enhance fine-grained control by aggregating features from multiple diffusion models within the same ecosystem.
Findings
AMDM significantly improves fine-grained control in diffusion models.
Diffusion models focus on position, attributes, and style at different stages.
AMDM enables effective feature activation without additional training.
Abstract
While many diffusion models perform well when controlling particular aspects such as style, character, and interaction, they struggle with fine-grained control due to dataset limitations and intricate model architecture design. This paper introduces a novel training-free algorithm for fine-grained generation, called Aggregation of Multiple Diffusion Models (AMDM). The algorithm integrates features in the latent data space from multiple diffusion models within the same ecosystem into a specified model, thereby activating particular features and enabling fine-grained control. Experimental results demonstrate that AMDM significantly improves fine-grained control without training, validating its effectiveness. Additionally, it reveals that diffusion models initially focus on features such as position, attributes, and style, with later stages improving generation quality and consistency.…
Peer Reviews
Decision·Submitted to ICLR 2026
- The perspective of leveraging existing diffusion models to address fine-grained, controllable generation in a training-free manner is an interesting direction. - The paper is well-written and easy to follow; empirical performance and results demonstrate the promise of the proposed method.
- "Diffusion ecosystem" is mentioned across the paper as a prerequisite of AMDM but somewhat loosely defined, for example, how to quantitatively verify whether any two models belong to the same ecosystem is unclear. - Evaluation scope is limited to certain derivatives of Stable Diffusion models. Throughout the experiments, the authors only picked a few classic SD1.4/1.5 and SDXL models, but did not examine whether the findings generalize to more recent model architectures based on DiT instead o
- The authors identify a genuine limitation in current models that they excel in specific aspects but struggle with others, and provide a solution without requiring retraining. - The approach is backed by mathematical analysis of why aggregation works for models in the same diffusion ecosystem. - Extensive experiments demonstrate clear improvements in both qualitative results and quantitative metrics. - Unlike many compositional methods that introduce significant computational overhead, AMDM has
- While the authors provide theoretical justification, the assumptions (functional proximity, conditional proximity) are somewhat heuristic and don't offer global guarantees. - AMDM only works for models within the same diffusion ecosystem, limiting its general applicability. - While some comparisons with compositional methods are provided, a more comprehensive comparison with other training-free approaches would strengthen the paper. - When aggregating models, there might be unintended interact
- The method is grounded in solid theoretical analysis, particularly regarding the confidence and reliability of aggregated diffusion scores. - The empirical evaluation is thorough, and the comparisons clearly demonstrate how AMDM improves over baselines in multiple tasks. - The motivation is practical and relevant: enabling reuse and integration of existing specialized diffusion models without retraining.
- Although the theoretical analysis is detailed, the paper may benefit from a simple controlled or toy example to help intuitively illustrate the effect of score aggregation and deviation optimization. - The evaluation primarily focuses on three types of models (MIGC, InteractDiffusion, and IP-Adapter). While the results are encouraging, a broader range of conditional diffusion methods or application settings would better support the generality of the method. - One of the key claims — that diffu
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus · Diffusion · ALIGN
