TL;DR
This paper introduces Shortcut-Rerouted Adapter Training, a method that prevents adapters from entangling target attributes with incidental factors by routing confounding factors through auxiliary modules, improving image synthesis quality and diversity.
Contribution
The paper proposes a novel training approach that uses auxiliary modules to reroute confounding factors, promoting disentangled representations in adapter-based image generation.
Findings
Improved generation quality and diversity.
Enhanced prompt adherence.
Effective disentanglement of target attributes from incidental factors.
Abstract
Adapter-based training has emerged as a key mechanism for extending the capabilities of powerful foundation image generators, enabling personalized and stylized text-to-image synthesis. These adapters are typically trained to capture a specific target attribute, such as subject identity, using single-image reconstruction objectives. However, because the input image inevitably contains a mixture of visual factors, adapters are prone to entangle the target attribute with incidental ones, such as pose, expression, and lighting. This spurious correlation problem limits generalization and obstructs the model's ability to adhere to the input text prompt. In this work, we uncover a simple yet effective solution: provide the very shortcuts we wish to eliminate during adapter training. In Shortcut-Rerouted Adapter Training, confounding factors are routed through auxiliary modules, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
