Towards Controllable and Photorealistic Region-wise Image Manipulation
Ansheng You, Chenglin Zhou, Qixuan Zhang, Lan Xu

TL;DR
This paper introduces a self-supervised auto-encoder model for controllable, region-wise image editing that disentangles content and style, enabling photorealistic, flexible manipulations without extra annotations.
Contribution
The proposed model achieves explicit content-style disentanglement and supports region-specific style transfer using only self-supervision, advancing controllable image editing techniques.
Findings
Effective region-wise style transfer demonstrated
Supports latent space interpolation and cross-domain style transfer
No extra annotations needed, only self-supervision
Abstract
Adaptive and flexible image editing is a desirable function of modern generative models. In this work, we present a generative model with auto-encoder architecture for per-region style manipulation. We apply a code consistency loss to enforce an explicit disentanglement between content and style latent representations, making the content and style of generated samples consistent with their corresponding content and style references. The model is also constrained by a content alignment loss to ensure the foreground editing will not interfere background contents. As a result, given interested region masks provided by users, our model supports foreground region-wise style transfer. Specially, our model receives no extra annotations such as semantic labels except for self-supervision. Extensive experiments show the effectiveness of the proposed method and exhibit the flexibility of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
