Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive   Manner

Pengxiang Cai; Zhiwei Liu; Guibo Zhu; Yunfang Niu; Jinqiao Wang

arXiv:2407.18656·cs.CV·July 29, 2024

Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner

Pengxiang Cai, Zhiwei Liu, Guibo Zhu, Yunfang Niu, Jinqiao Wang

PDF

Open Access

TL;DR

Auto DragGAN introduces a novel autoregressive approach using a transformer-based network to enable fast, pixel-level precise image editing by predicting latent code movements, outperforming previous methods in speed and accuracy.

Contribution

This work is the first to employ a regression-based network for learning StyleGAN latent variations during image dragging, achieving high-fidelity, pixel-level editing with real-time inference.

Findings

01

Achieves state-of-the-art inference speed and editing quality.

02

Predicts small pixel movements for high-fidelity edits.

03

Uses autoregressive latent trajectory prediction for stability.

Abstract

Pixel-level fine-grained image editing remains an open challenge. Previous works fail to achieve an ideal trade-off between control granularity and inference speed. They either fail to achieve pixel-level fine-grained control, or their inference speed requires optimization. To address this, this paper for the first time employs a regression-based network to learn the variation patterns of StyleGAN latent codes during the image dragging process. This method enables pixel-level precision in dragging editing with little time cost. Users can specify handle points and their corresponding target points on any GAN-generated images, and our method will move each handle point to its corresponding target point. Through experimental analysis, we discover that a short movement distance from handle points to target points yields a high-fidelity edited image, as the model only needs to predict the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques · Advanced Vision and Imaging

MethodsDense Connections · Feedforward Network · R1 Regularization · HuMan(Expedia)||How do I get a human at Expedia? · Convolution · Adaptive Instance Normalization · StyleGAN · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings