Cross-View Exocentric to Egocentric Video Synthesis
Gaowen Liu, Hao Tang, Hugo Latapie, Jason Corso, Yan Yan

TL;DR
This paper introduces STA-GAN, a novel bi-directional attention-based generative model that synthesizes egocentric videos from exocentric views by capturing spatial and temporal features, outperforming existing methods.
Contribution
The paper proposes a new STA-GAN model with dual discriminators and attention fusion for cross-view video synthesis, addressing the challenge of view transformation.
Findings
STA-GAN significantly outperforms existing methods on Side2Ego and Top2Ego datasets.
The bi-directional attention fusion improves the quality of generated egocentric videos.
Dual discriminators enhance the robustness of network training.
Abstract
Cross-view video synthesis task seeks to generate video sequences of one view from another dramatically different view. In this paper, we investigate the exocentric (third-person) view to egocentric (first-person) view video generation task. This is challenging because egocentric view sometimes is remarkably different from the exocentric view. Thus, transforming the appearances across the two different views is a non-trivial task. Particularly, we propose a novel Bi-directional Spatial Temporal Attention Fusion Generative Adversarial Network (STA-GAN) to learn both spatial and temporal information to generate egocentric video sequences from the exocentric view. The proposed STA-GAN consists of three parts: temporal branch, spatial branch, and attention fusion. First, the temporal and spatial branches generate a sequence of fake frames and their corresponding features. The fake frames…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Generative Adversarial Networks and Image Synthesis
