Auto DragGAN: Editing the Generative Image Manifold in an Autoregressive Manner
Pengxiang Cai, Zhiwei Liu, Guibo Zhu, Yunfang Niu, Jinqiao Wang

TL;DR
Auto DragGAN introduces a novel autoregressive approach using a transformer-based network to enable fast, pixel-level precise image editing by predicting latent code movements, outperforming previous methods in speed and accuracy.
Contribution
This work is the first to employ a regression-based network for learning StyleGAN latent variations during image dragging, achieving high-fidelity, pixel-level editing with real-time inference.
Findings
Achieves state-of-the-art inference speed and editing quality.
Predicts small pixel movements for high-fidelity edits.
Uses autoregressive latent trajectory prediction for stability.
Abstract
Pixel-level fine-grained image editing remains an open challenge. Previous works fail to achieve an ideal trade-off between control granularity and inference speed. They either fail to achieve pixel-level fine-grained control, or their inference speed requires optimization. To address this, this paper for the first time employs a regression-based network to learn the variation patterns of StyleGAN latent codes during the image dragging process. This method enables pixel-level precision in dragging editing with little time cost. Users can specify handle points and their corresponding target points on any GAN-generated images, and our method will move each handle point to its corresponding target point. Through experimental analysis, we discover that a short movement distance from handle points to target points yields a high-fidelity edited image, as the model only needs to predict the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Computer Graphics and Visualization Techniques · Advanced Vision and Imaging
MethodsDense Connections · Feedforward Network · R1 Regularization · HuMan(Expedia)||How do I get a human at Expedia? · Convolution · Adaptive Instance Normalization · StyleGAN · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
