Efficient Text-Guided Convolutional Adapter for the Diffusion Model

Aryan Das; Koushik Biswas; Swalpa Kumar Roy; Badri Narayana Patro; Vinay Kumar Verma

arXiv:2602.14514·cs.CV·February 23, 2026

Efficient Text-Guided Convolutional Adapter for the Diffusion Model

Aryan Das, Koushik Biswas, Swalpa Kumar Roy, Badri Narayana Patro, Vinay Kumar Verma

PDF

Open Access

TL;DR

The paper presents Nexus Adapters, efficient prompt-guided adapters for diffusion models that improve structure-preserving conditional image generation with fewer parameters and enhanced multimodal understanding.

Contribution

Introduction of Nexus Prime and Slim adapters that are prompt-guided, multimodal, and parameter-efficient, improving structure-preserving diffusion-based image generation.

Findings

01

Nexus Prime significantly improves performance with only 8M extra parameters.

02

Nexus Slim achieves state-of-the-art results with 18M fewer parameters.

03

Adapters effectively incorporate prompt and structural input understanding.

Abstract

We introduce the Nexus Adapters, novel text-guided efficient adapters to the diffusion-based framework for the Structure Preserving Conditional Generation (SPCG). Recently, structure-preserving methods have achieved promising results in conditional image generation by using a base model for prompt conditioning and an adapter for structure input, such as sketches or depth maps. These approaches are highly inefficient and sometimes require equal parameters in the adapter compared to the base architecture. It is not always possible to train the model since the diffusion model is itself costly, and doubling the parameter is highly inefficient. In these approaches, the adapter is not aware of the input prompt; therefore, it is optimal only for the structural input but not for the input prompt. To overcome the above challenges, we proposed two efficient adapters, Nexus Prime and Slim, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques · Multimodal Machine Learning Applications