The Image Local Autoregressive Transformer

Chenjie Cao; Yuxin Hong; Xiang Li; Chengrong Wang; Chengming Xu,; XiangYang Xue; Yanwei Fu

arXiv:2106.02514·cs.CV·October 19, 2021·1 cites

The Image Local Autoregressive Transformer

Chenjie Cao, Yuxin Hong, Xiang Li, Chengrong Wang, Chengming Xu,, XiangYang Xue, Yanwei Fu

PDF

Open Access 1 Repo 1 Video

TL;DR

The paper introduces iLAT, a local autoregressive transformer model that improves local image editing by learning local discrete representations, addressing issues of global information loss and slow inference in existing AR models.

Contribution

iLAT is a novel model that combines attention masks and convolution to efficiently synthesize local image regions with guidance, enhancing local image editing tasks.

Findings

01

iLAT outperforms existing models in local image synthesis tasks.

02

The model effectively preserves global information during local editing.

03

Quantitative and qualitative results demonstrate its efficacy.

Abstract

Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To address these limitations, we propose a novel model -- image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis. Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. Thus iLAT can efficiently synthesize the local image regions by key guidance information. Our iLAT is evaluated on various locally guided image syntheses, such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ewrfcas/iLAT
pytorch

Videos

The Image Local Autoregressive Transformer· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Vision and Imaging

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Label Smoothing · Convolution · Residual Connection · Dense Connections · Softmax