Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting

Guangben Lu; Yuzhen Du; Zhimin Sun; Ran Yi; Yifan Qi; Yizhe Tang; Tianyi Wang; Lizhuang Ma; Fangyuan Zou

arXiv:2412.03812·cs.CV·August 7, 2025

Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting

Guangben Lu, Yuzhen Du, Zhimin Sun, Ran Yi, Yifan Qi, Yizhe Tang, Tianyi Wang, Lizhuang Ma, Fangyuan Zou

PDF

Open Access

TL;DR

Pinco introduces a novel adapter for diffusion transformers that improves foreground-conditioned inpainting by enhancing shape preservation, text alignment, and feature extraction, leading to superior image quality.

Contribution

The paper proposes a plug-and-play adapter with a self-consistent attention mechanism, decoupled feature extraction, and shared positional embeddings for improved foreground inpainting.

Findings

01

Achieves better shape preservation of foreground subjects.

02

Enhances alignment between generated background and text descriptions.

03

Improves training efficiency and overall inpainting quality.

Abstract

Foreground-conditioned inpainting aims to seamlessly fill the background region of an image by utilizing the provided foreground subject and a text description. While existing T2I-based image inpainting methods can be applied to this task, they suffer from issues of subject shape expansion, distortion, or impaired ability to align with the text description, resulting in inconsistencies between the visual elements and the text description. To address these challenges, we propose Pinco, a plug-and-play foreground-conditioned inpainting adapter that generates high-quality backgrounds with good text alignment while effectively preserving the shape of the foreground subject. Firstly, we design a Self-Consistent Adapter that integrates the foreground subject features into the layout-related self-attention layer, which helps to alleviate conflicts between the text and subject features by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis

MethodsSoftmax · Attention Is All You Need · Adapter · Inpainting · ALIGN · Focus