PortraitDirector: A Hierarchical Disentanglement Framework for Controllable and Real-time Facial Reenactment
Chaonan Ji, Jinwei Qi, Sheng Xu, Peng Zhang, Bang Zhang

TL;DR
PortraitDirector introduces a hierarchical framework for facial reenactment that disentangles motion into physical and emotional components, enabling high-fidelity, controllable, real-time face animation.
Contribution
It proposes a novel hierarchical disentanglement approach combining spatial and semantic layers for improved control and fidelity in facial reenactment.
Findings
Achieves 512x512 face reenactment at 20 FPS with 800 ms latency.
Effectively disentangles head pose, expressions, and emotions for controllable reenactment.
Maintains high fidelity and real-time performance through optimized architecture.
Abstract
Existing facial reenactment methods struggle with a trade-off between expressiveness and fine-grained controllability. Holistic facial reenactment models often sacrifice granular control for expressiveness, while methods designed for control may struggle with fidelity and robust disentanglement. Instead of treating facial motion as a monolithic signal, we explore an alternative compositional perspective. In this paper, we introduce PortraitDirector, a novel framework that formulates face reenactment as a hierarchical composition task, achieving high-fidelity and controllable results. We employ a Hierarchical Motion Disentanglement and Composition strategy, deconstructing facial motion into a Spatial Layer for physical movements and a Semantic Layer for emotional content. The Spatial Layer comprises: (i) global head pose, managed via a dedicated representation and injection pathway; (ii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
