AnyRefill: A Unified, Data-Efficient Framework for Left-Prompt-Guided Vision Tasks
Ming Xie, Chenjie Cao, Yunuo Cai, Xiangyang Xue, Yu-Gang Jiang, Yanwei, Fu

TL;DR
AnyRefill introduces a unified, data-efficient framework leveraging left-prompt-guided reformulation and inpainting priors to effectively address diverse reference-based vision tasks without extra visual encoders.
Contribution
It proposes AnyRefill, a novel extension of LeftRefill, that adapts Text-to-Image models for multiple vision tasks using a left-right stitching approach and minimal fine-tuning.
Findings
Outperforms other image condition injection methods.
Achieves competitive results with state-of-the-art open-source tools.
Maintains high performance with minimal task-specific fine-tuning.
Abstract
In this paper, we present a novel Left-Prompt-Guided (LPG) paradigm to address a diverse range of reference-based vision tasks. Inspired by the human creative process, we reformulate these tasks using a left-right stitching formulation to construct contextual input. Building upon this foundation, we propose AnyRefill, an extension of LeftRefill, that effectively adapts Text-to-Image (T2I) models to various vision tasks. AnyRefill leverages the inpainting priors of advanced T2I model based on the Diffusion Transformer (DiT) architecture, and incorporates flexible components to enhance its capabilities. By combining task-specific LoRAs with the stitching input, AnyRefill unlocks its potential across diverse tasks, including conditional generation, visual perception, and image editing, without requiring additional visual encoders. Meanwhile, AnyRefill exhibits remarkable data efficiency,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing · EEG and Brain-Computer Interfaces
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · Diffusion · Position-Wise Feed-Forward Layer · Adam
