EgoX: Egocentric Video Generation from a Single Exocentric Video

Taewoong Kang; Kinam Kim; Dohyeon Kim; Minho Park; Junha Hyung; Jaegul Choo

arXiv:2512.08269·cs.CV·December 10, 2025

EgoX: Egocentric Video Generation from a Single Exocentric Video

Taewoong Kang, Kinam Kim, Dohyeon Kim, Minho Park, Junha Hyung, Jaegul Choo

PDF

Open Access 1 Models

TL;DR

EgoX is a novel framework that converts third-person videos into realistic egocentric videos by leveraging large-scale video diffusion models, geometric guidance, and a unified conditioning strategy, enabling coherent and high-fidelity synthesis.

Contribution

We introduce EgoX, a new method that uses pretrained diffusion models with lightweight adaptation and a geometry-guided attention mechanism for egocentric video generation from a single exocentric video.

Findings

01

Achieves coherent and realistic egocentric video synthesis.

02

Demonstrates robustness across unseen and in-the-wild videos.

03

Scales effectively with minimal additional training.

Abstract

Egocentric perception enables humans to experience and understand the world directly from their own point of view. Translating exocentric (third-person) videos into egocentric (first-person) videos opens up new possibilities for immersive understanding but remains highly challenging due to extreme camera pose variations and minimal view overlap. This task requires faithfully preserving visible content while synthesizing unseen regions in a geometrically consistent manner. To achieve this, we present EgoX, a novel framework for generating egocentric videos from a single exocentric input. EgoX leverages the pretrained spatio temporal knowledge of large-scale video diffusion models through lightweight LoRA adaptation and introduces a unified conditioning strategy that combines exocentric and egocentric priors via width and channel wise concatenation. Additionally, a geometry-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
DAVIAN-Robotics/EgoX
model· 54 dl· ♡ 8
54 dl♡ 8

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis