MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
Donghao Zhou, Jiancheng Huang, Jinbin Bai, Jiaze Wang, Hao Chen, Guangyong Chen, Xiaowei Hu, Pheng-Ann Heng

TL;DR
MagicTailor introduces a novel framework for fine-grained, component-level personalization in text-to-image diffusion models, overcoming semantic pollution and imbalance to enable more customizable and creative image generation.
Contribution
We propose MagicTailor, a new framework with Dynamic Masked Degradation and Dual-Stream Balancing to improve component-controllable personalization in diffusion models.
Findings
Outperforms existing methods in personalized image generation
Effectively reduces semantic pollution and imbalance
Enhances user control over visual components
Abstract
Text-to-image diffusion models can generate high-quality images but lack fine-grained control of visual concepts, limiting their creativity. Thus, we introduce component-controllable personalization, a new task that enables users to customize and reconfigure individual components within concepts. This task faces two challenges: semantic pollution, where undesired elements disrupt the target concept, and semantic imbalance, which causes disproportionate learning of the target concept and component. To address these, we design MagicTailor, a framework that uses Dynamic Masked Degradation to adaptively perturb unwanted visual semantics and Dual-Stream Balancing for more balanced learning of desired visual semantics. The experimental results show that MagicTailor achieves superior performance in this task and enables more personalized and creative image generation.
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
**1. Task Definition:** Unlike prior works that usually focus on multi-concept personalization, this paper uniquely targets component-controllable personalization, which separates the concept and the component and allows control over specific components (e.g., a person's eyebrow) within a larger concept (e.g., the person). **2. Problem Statement:** The authors clearly depict the challenges they aim to solve (semantic pollution and imbalance), contributing to the community by highlighting a new
**1. Complexity of Implementation:** The proposed methods, DS-Bal and DM-Deg, seem complex to implement and computationally demanding, requiring substantial resources. **2. Limited Generalizability:** The paper primarily shows results in controlled or geometrically similar scenarios (e.g., where the layout in concept and composition are consistent), it raises questions about how the method would perform under more complex or varied conditions.
1. The writing of the paper is clear and easy to follow. 2. This work defines a task named component-controllable personalization, which aims at more precise and fine-grained customization for T2I models, and this may be of interest to the community in a broader context. 3. The overall structure of the article is clear. Two challenges of component-controllable personalization are discussed, and visual effects are illustrated for better comprehension. Further, DM-Deg and DS-Bal are proposed resp
1. The designed method is a little bit complicated. It seems that undesired conflicts may occur during the learning processes of different concepts. Hence the authors present multiple carefully designed constraints for different 'unfortunate' occasions. I wonder whether the proposed methods generalize well when applied to different model architectures, will the hyper-params be carefully chosen again? As a personalization method, is it compatible with some trained LoRAs (e.g. cartoon, realistic)
1. The authors composed a diverse quantitative and qualitative comparisons. They addressed the problem holistically by created ablation studies, user studies, sensitivity studies for hyper parameters, etc. 2. The idea of customizing components and combine them together in a new generated image is relatively new.
1. The semantic pollution and semantic imbalance issues are too similar and not clearly disentangled. I am not convinced why the authors addressed them as different problems. 2. Lack of evidence to claim the 2 aforementioned problems occur. Only a couple of examples are shown in support. 3. The Dynamic Masked Degradation and Dual-Stream Balancing solutions also seem very similar. They use similar masking approaches. The separate approach of Sample-wise Min-Max Optimization and Selective Preservi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Video Analysis and Summarization · Multimedia Communication and Technology
MethodsDiffusion
