# Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation

**Authors:** Xuechao Zou, Shun Zhang, Xing Fu, Yue Li, Kai Li, Yushe Cao, Congyan Lang, Pin Tao, and Junliang Xing

arXiv: 2509.00428 · 2025-09-03

## TL;DR

Face-MoGLE introduces a novel diffusion transformer framework with expert specialization and semantic decoupling, enabling highly controllable and photorealistic face generation with strong zero-shot generalization.

## Contribution

It proposes a new architecture combining global and local experts with dynamic gating for improved controllable face synthesis.

## Key findings

- Effective in multimodal and monomodal face generation
- Achieves strong zero-shot generalization
- Outperforms existing methods in controllability and realism

## Abstract

Controllable face generation poses critical challenges in generative modeling due to the intricate balance required between semantic controllability and photorealism. While existing approaches struggle with disentangling semantic controls from generation pipelines, we revisit the architectural potential of Diffusion Transformers (DiTs) through the lens of expert specialization. This paper introduces Face-MoGLE, a novel framework featuring: (1) Semantic-decoupled latent modeling through mask-conditioned space factorization, enabling precise attribute manipulation; (2) A mixture of global and local experts that captures holistic structure and region-level semantics for fine-grained controllability; (3) A dynamic gating network producing time-dependent coefficients that evolve with diffusion steps and spatial locations. Face-MoGLE provides a powerful and flexible solution for high-quality, controllable face generation, with strong potential in generative modeling and security applications. Extensive experiments demonstrate its effectiveness in multimodal and monomodal face generation settings and its robust zero-shot generalization capability. Project page is available at https://github.com/XavierJiezou/Face-MoGLE.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00428/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00428/full.md

## References

65 references — full list in the complete paper: https://tomesphere.com/paper/2509.00428/full.md

---
Source: https://tomesphere.com/paper/2509.00428