Exo2EgoSyn: Unlocking Foundation Video Generation Models for Exocentric-to-Egocentric Video Synthesis

Mohammad Mahdi; Yuqian Fu; Nedko Savov; Jiancheng Pan; Danda Pani Paudel; Luc Van Gool

arXiv:2511.20186·cs.CV·November 26, 2025

Exo2EgoSyn: Unlocking Foundation Video Generation Models for Exocentric-to-Egocentric Video Synthesis

Mohammad Mahdi, Yuqian Fu, Nedko Savov, Jiancheng Pan, Danda Pani Paudel, Luc Van Gool

PDF

Open Access

TL;DR

Exo2EgoSyn adapts foundation video models to enable high-quality egocentric video synthesis from exocentric views, using view alignment, multi-view conditioning, and pose-aware latent injection.

Contribution

The paper introduces Exo2EgoSyn, a novel framework that extends foundation video models for cross-view egocentric-to-exocentric synthesis without retraining.

Findings

01

Significant improvement in ego-to-exo video synthesis quality.

02

Effective cross-view synthesis without retraining foundation models.

03

Validated on ExoEgo4D dataset with promising results.

Abstract

Foundation video generation models such as WAN 2.2 exhibit strong text- and image-conditioned synthesis abilities but remain constrained to the same-view generation setting. In this work, we introduce Exo2EgoSyn, an adaptation of WAN 2.2 that unlocks Exocentric-to-Egocentric(Exo2Ego) cross-view video synthesis. Our framework consists of three key modules. Ego-Exo View Alignment(EgoExo-Align) enforces latent-space alignment between exocentric and egocentric first-frame representations, reorienting the generative space from the given exo view toward the ego view. Multi-view Exocentric Video Conditioning (MultiExoCon) aggregates multi-view exocentric videos into a unified conditioning signal, extending WAN2.2 beyond its vanilla single-image or text conditioning. Furthermore, Pose-Aware Latent Injection (PoseInj) injects relative exo-to-ego camera pose information into the latent state,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · 3D Shape Modeling and Analysis