FACEMUG: A Multimodal Generative and Fusion Framework for Local Facial Editing
Wanglong Lu, Jikai Wang, Xiaogang Jin, Xianta Jiang, Hanli Zhao

TL;DR
FACEMUG is a novel multimodal framework for local facial editing that supports diverse input modalities, maintains image quality after multiple edits, and enables fine-grained, semantic, and globally consistent facial manipulations.
Contribution
It introduces a unified multimodal generative model with a novel fusion mechanism and a self-supervised latent warping algorithm for improved local facial editing.
Findings
Outperforms state-of-the-art methods in editing quality and flexibility.
Supports multiple input modalities including sketches, text, and attribute labels.
Maintains high image quality after multiple incremental edits.
Abstract
Existing facial editing methods have achieved remarkable results, yet they often fall short in supporting multimodal conditional local facial editing. One of the significant evidences is that their output image quality degrades dramatically after several iterations of incremental editing, as they do not support local editing. In this paper, we present a novel multimodal generative and fusion framework for globally-consistent local facial editing (FACEMUG) that can handle a wide range of input modalities and enable fine-grained and semantic manipulation while remaining unedited parts unchanged. Different modalities, including sketches, semantic maps, color maps, exemplar images, text, and attribute labels, are adept at conveying diverse conditioning details, and their combined synergy can provide more explicit guidance for the editing process. We thus integrate all modalities into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
