EgoX: Egocentric Video Generation from a Single Exocentric Video
Taewoong Kang, Kinam Kim, Dohyeon Kim, Minho Park, Junha Hyung, Jaegul Choo

TL;DR
EgoX is a novel framework that converts third-person videos into realistic egocentric videos by leveraging large-scale video diffusion models, geometric guidance, and a unified conditioning strategy, enabling coherent and high-fidelity synthesis.
Contribution
We introduce EgoX, a new method that uses pretrained diffusion models with lightweight adaptation and a geometry-guided attention mechanism for egocentric video generation from a single exocentric video.
Findings
Achieves coherent and realistic egocentric video synthesis.
Demonstrates robustness across unseen and in-the-wild videos.
Scales effectively with minimal additional training.
Abstract
Egocentric perception enables humans to experience and understand the world directly from their own point of view. Translating exocentric (third-person) videos into egocentric (first-person) videos opens up new possibilities for immersive understanding but remains highly challenging due to extreme camera pose variations and minimal view overlap. This task requires faithfully preserving visible content while synthesizing unseen regions in a geometrically consistent manner. To achieve this, we present EgoX, a novel framework for generating egocentric videos from a single exocentric input. EgoX leverages the pretrained spatio temporal knowledge of large-scale video diffusion models through lightweight LoRA adaptation and introduces a unified conditioning strategy that combines exocentric and egocentric priors via width and channel wise concatenation. Additionally, a geometry-guided…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis
