Denoising Task Routing for Diffusion Models
Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim

TL;DR
This paper introduces Denoising Task Routing (DTR), a simple strategy to explicitly incorporate multi-task learning principles into diffusion models, improving performance and training efficiency without extra parameters.
Contribution
DTR is a novel add-on that creates task-specific pathways in diffusion models, leveraging task affinity and weights to enhance performance and convergence.
Findings
DTR boosts diffusion model performance across evaluation protocols.
DTR accelerates training convergence and reduces training iterations.
DTR achieves comparable results with fewer training steps.
Abstract
Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways for individual tasks within a single architecture by selectively activating subsets of channels in the model. What makes DTR particularly compelling is its seamless integration of prior knowledge of denoising tasks into the framework: (1) Task Affinity: DTR activates similar channels for tasks at adjacent timesteps and shifts activated channels as sliding windows…
Peer Reviews
Decision·ICLR 2024 poster
1. The proposed routing mask strategy is interesting as it leverages the task similarity between adjacent timesteps. 2. The experiment conducted in this study is comprehensive and demonstrates significant performance improvement.
1. The idea of considering diffusion models as multi-task learning has previously been proposed by Hang et al. (2023) and Go et al. (2023a). The proposed masking strategy in this work is a simple modification of TR (Strezoski et al., 2019). Its novelty is limited. 2. It lacks an ablation study to evaluate the necessity of the proposed masking strategy. Ding et al. (2023) propose to divide channels into shared channels and task-specific channels. Assigning each time-step cluster (Go et al., 2023
* The paper effectively addresses the negative transfer phenomena by establishing task-specific pathways for multiple denoising tasks. The concept of integrating key prior knowledge in diffusion and task routing is well-presented and could potentially influence future work on architecture design in diffusion models. * The implementation, although simple, is effective and yields significant performance gains on multiple benchmarks. * The paper is structured well, making it easy to understand and
* The empirical analysis could be more comprehensive in decoupling the contributions of task weights and task affinity. As I understand, the results in Figure 4 only ablate the significance of the synergy of the two priors. To study the direct contribution of **Task Weights**, it would be helpful to compare `DTR with random routing but task-dedicated allocation channels` with `Random Task Routing (R-TR)`. Similarly, to study the contribution of **Task Affinity**, a comparison between `DTR with t
1. This paper proposes a simple add-on strategy for existing diffusion model architectures, which is simple yet effective, without introducing additional parameters, and contributes to accelerating convergence during training. 2. Extensive experiments demonstrate the effectiveness and efficiency of the proposed method. 3. The paper is well-written and easy to follow.
1. Some advanced routing methods [1, 2] improve the random routing by considering the inter-task relationship. Hence, it is better to discuss and compare the proposed method with them. 2. In Figure 9, the images generated by the baseline (the first row) look very strange and both R-TR and DTR methods alleviate it (the second and third rows). So why the random routing method can work well? In particular, in the fifth case/column, the image generated by R-TR looks better than the one generated by
Code & Models
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques
MethodsFocus · Diffusion
