The Image Local Autoregressive Transformer
Chenjie Cao, Yuxin Hong, Xiang Li, Chengrong Wang, Chengming Xu,, XiangYang Xue, Yanwei Fu

TL;DR
The paper introduces iLAT, a local autoregressive transformer model that improves local image editing by learning local discrete representations, addressing issues of global information loss and slow inference in existing AR models.
Contribution
iLAT is a novel model that combines attention masks and convolution to efficiently synthesize local image regions with guidance, enhancing local image editing tasks.
Findings
iLAT outperforms existing models in local image synthesis tasks.
The model effectively preserves global information during local editing.
Quantitative and qualitative results demonstrate its efficacy.
Abstract
Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To address these limitations, we propose a novel model -- image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis. Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. Thus iLAT can efficiently synthesize the local image regions by key guidance information. Our iLAT is evaluated on various locally guided image syntheses, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Vision and Imaging
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Label Smoothing · Convolution · Residual Connection · Dense Connections · Softmax
