ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for   ControlNet

Soon Yau Cheong; Armin Mustafa; Andrew Gilbert

arXiv:2312.03154·cs.CV·September 5, 2024·1 cites

ViscoNet: Bridging and Harmonizing Visual and Textual Conditioning for ControlNet

Soon Yau Cheong, Armin Mustafa, Andrew Gilbert

PDF

Open Access 1 Repo

TL;DR

ViscoNet is a lightweight, novel architecture that effectively combines spatial and visual conditioning in text-to-image models, addressing mode collapse and enhancing versatility in human image generation tasks.

Contribution

Introduces ViscoNet, a one-branch-adapter architecture that preserves generative power while requiring fewer parameters and dataset size, and effectively addresses mode collapse.

Findings

01

Outperforms existing methods in visual-text harmony

02

Reduces training parameters and dataset requirements

03

Excels in diverse human image generation tasks

Abstract

This paper introduces ViscoNet, a novel one-branch-adapter architecture for concurrent spatial and visual conditioning. Our lightweight model requires trainable parameters and dataset size multiple orders of magnitude smaller than the current state-of-the-art IP-Adapter. However, our method successfully preserves the generative power of the frozen text-to-image (T2I) backbone. Notably, it excels in addressing mode collapse, a pervasive issue previously overlooked. Our novel architecture demonstrates outstanding capabilities in achieving a harmonious visual-text balance, unlocking unparalleled versatility in various human image generation tasks, including pose re-targeting, virtual try-on, stylization, person re-identification, and textile transfer.Demo and code are available from project page https://soon-yau.github.io/visconet/ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

soon-yau/visconet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning

MethodsDiffusion · Latent Diffusion Model