Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled   Self-Attention Injection

Gihyun Kwon; Jangho Park; Jong Chul Ye

arXiv:2405.16823·cs.CV·May 28, 2024

Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection

Gihyun Kwon, Jangho Park, Jong Chul Ye

PDF

Open Access

TL;DR

This paper introduces a unified editing framework that leverages shared self-attention in a 2D diffusion model to enable consistent editing across panoramas, 3D scenes, and videos, simplifying multi-modal editing tasks.

Contribution

It proposes a novel self-attention injection method that unifies editing across multiple modalities using a single 2D text-to-image diffusion model.

Findings

01

Enables consistent editing of videos and 3D scenes using shared self-attention.

02

Supports editing of panoramic images with semantic consistency.

03

Demonstrates versatility across diverse visual modalities.

Abstract

While text-to-image models have achieved impressive capabilities in image generation and editing, their application across various modalities often necessitates training separate models. Inspired by existing method of single image editing with self attention injection and video editing with shared attention, we propose a novel unified editing framework that combines the strengths of both approaches by utilizing only a basic 2D image text-to-image (T2I) diffusion model. Specifically, we design a sampling method that facilitates editing consecutive images while maintaining semantic consistency utilizing shared self-attention features during both reference and consecutive image sampling processes. Experimental results confirm that our method enables editing across diverse modalities including 3D scenes, videos, and panorama images.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Advanced Image Processing Techniques

MethodsDiffusion