ASSET: Autoregressive Semantic Scene Editing with Transformers at High   Resolutions

Difan Liu; Sandesh Shetty; Tobias Hinz; Matthew Fisher; Richard Zhang,; Taesung Park; Evangelos Kalogerakis

arXiv:2205.12231·cs.CV·May 25, 2022·5 cites

ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions

Difan Liu, Sandesh Shetty, Tobias Hinz, Matthew Fisher, Richard Zhang,, Taesung Park, Evangelos Kalogerakis

PDF

Open Access 1 Repo

TL;DR

ASSET introduces a transformer-based neural architecture that efficiently edits high-resolution images by sparsifying attention guided by lower-resolution attention, enabling realistic scene modifications and long-range interactions.

Contribution

The paper proposes a novel sparse attention mechanism for transformers that handles high-resolution images efficiently, improving scene editing capabilities.

Findings

01

Effective high-resolution image editing with scene consistency.

02

Captures long-range interactions like reflections and landscapes.

03

Outperforms previous methods in qualitative and quantitative evaluations.

Abstract

We present ASSET, a neural architecture for automatically modifying an input high-resolution image according to a user's edits on its semantic segmentation map. Our architecture is based on a transformer with a novel attention mechanism. Our key idea is to sparsify the transformer's attention matrix at high resolutions, guided by dense attention extracted at lower image resolutions. While previous attention mechanisms are computationally too expensive for handling high-resolution images or are overly constrained within specific image regions hampering long-range interactions, our novel attention mechanism is both computationally efficient and effective. Our sparsified attention mechanism is able to capture long-range interactions and context, leading to synthesizing interesting phenomena in scenes, such as reflections of landscapes onto water or flora consistent with the rest of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

difanliu/asset
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis