Intention-driven Ego-to-Exo Video Generation

Hongchen Luo; Kai Zhu; Wei Zhai; Yang Cao

arXiv:2403.09194·cs.CV·March 19, 2024·1 cites

Intention-driven Ego-to-Exo Video Generation

Hongchen Luo, Kai Zhu, Wei Zhai, Yang Cao

PDF

Open Access

TL;DR

This paper introduces IDE, a novel framework for ego-to-exo video generation that uses action intentions as view-independent guides, overcoming previous limitations in handling drastic view changes.

Contribution

The paper proposes an intention-driven approach that leverages human action semantics and head trajectory estimation to generate consistent exocentric videos from egocentric inputs.

Findings

01

Outperforms state-of-the-art models in subjective assessments

02

Achieves higher accuracy in head trajectory estimation

03

Effectively preserves content and motion consistency

Abstract

Ego-to-exo video generation refers to generating the corresponding exocentric video according to the egocentric video, providing valuable applications in AR/VR and embodied AI. Benefiting from advancements in diffusion model techniques, notable progress has been achieved in video generation. However, existing methods build upon the spatiotemporal consistency assumptions between adjacent frames, which cannot be satisfied in the ego-to-exo scenarios due to drastic changes in views. To this end, this paper proposes an Intention-Driven Ego-to-exo video generation framework (IDE) that leverages action intention consisting of human movement and action description as view-independent representation to guide video generation, preserving the consistency of content and motion. Specifically, the egocentric head trajectory is first estimated through multi-view stereo matching. Then, cross-view…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCinema and Media Studies · Advanced Vision and Imaging · Video Coding and Compression Technologies

MethodsDiffusion