ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu, Yiming Zhao, Zhicong Tang, Ruihong Yin, Haoxing Ye, Yuhui, Yuan, Dong Chen, Jianmin Bao, Sirui Zhang, Yanbin Wang, Lin Liang, Lijuan, Wang, Ji Li, Xiu Li, Zhouhui Lian, Gao Huang, Baining Guo

TL;DR
ART introduces a novel transformer-based method for generating variable multi-layer transparent images from text prompts, enabling efficient, high-quality, and editable layered image creation with reduced computational costs.
Contribution
The paper proposes the Anonymous Region Transformer (ART), a new approach that uses anonymous region layouts and layer-wise cropping to improve multi-layer image generation efficiency and quality.
Findings
Over 12 times faster than full attention methods
Supports over 50 distinct image layers
Enables precise control and scalable layer generation
Abstract
Multi-layer image generation is a fundamental task that enables users to isolate, select, and edit specific image layers, thereby revolutionizing interactions with generative models. In this paper, we introduce the Anonymous Region Transformer (ART), which facilitates the direct generation of variable multi-layer transparent images based on a global text prompt and an anonymous region layout. Inspired by Schema theory suggests that knowledge is organized in frameworks (schemas) that enable people to interpret and learn from new information by linking it to prior knowledge.}, this anonymous region layout allows the generative model to autonomously determine which set of visual tokens should align with which text tokens, which is in contrast to the previously dominant semantic layout for the image generation task. In addition, the layer-wise region crop mechanism, which only selects the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Digital Media Forensic Detection · Chaos-based Image/Signal Encryption
MethodsAttention Is All You Need · Absolute Position Encodings · Dense Connections · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer
