AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided   Vision Tasks

Ming Xie; Chenjie Cao; Yunuo Cai; Xiangyang Xue; Yu-Gang Jiang; Yanwei; Fu

arXiv:2502.11158·cs.CV·February 19, 2025

AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks

Ming Xie, Chenjie Cao, Yunuo Cai, Xiangyang Xue, Yu-Gang Jiang, Yanwei, Fu

PDF

Open Access

TL;DR

AnyRefill introduces a unified, data-efficient framework leveraging left-prompt-guided reformulation and inpainting priors to effectively address diverse reference-based vision tasks without extra visual encoders.

Contribution

It proposes AnyRefill, a novel extension of LeftRefill, that adapts Text-to-Image models for multiple vision tasks using a left-right stitching approach and minimal fine-tuning.

Findings

01

Outperforms other image condition injection methods.

02

Achieves competitive results with state-of-the-art open-source tools.

03

Maintains high performance with minimal task-specific fine-tuning.

Abstract

In this paper, we present a novel Left-Prompt-Guided (LPG) paradigm to address a diverse range of reference-based vision tasks. Inspired by the human creative process, we reformulate these tasks using a left-right stitching formulation to construct contextual input. Building upon this foundation, we propose AnyRefill, an extension of LeftRefill, that effectively adapts Text-to-Image (T2I) models to various vision tasks. AnyRefill leverages the inpainting priors of advanced T2I model based on the Diffusion Transformer (DiT) architecture, and incorporates flexible components to enhance its capabilities. By combining task-specific LoRAs with the stitching input, AnyRefill unlocks its potential across diverse tasks, including conditional generation, visual perception, and image editing, without requiring additional visual encoders. Meanwhile, AnyRefill exhibits remarkable data efficiency,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing · EEG and Brain-Computer Interfaces

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · Diffusion · Position-Wise Feed-Forward Layer · Adam