Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement

Zhengxian Yang; Shengqi Wang; Shi Pan; Hongshuai Li; Haoxiang Wang; Lin Li; Guanjun Li; Zhengqi Wen; Borong Lin; Jianhua Tao; Tao Yu

arXiv:2604.09473·cs.CV·April 13, 2026

Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement

Zhengxian Yang, Shengqi Wang, Shi Pan, Hongshuai Li, Haoxiang Wang, Lin Li, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, Tao Yu

PDF

TL;DR

This paper introduces a new volumetric media format and a comprehensive pipeline for creating immersive 6-DoF VR experiences from real-world videos, supported by a novel multi-modal dataset and reconstruction methods.

Contribution

It presents a new immersive volumetric video format, a multi-view multimodal dataset, and a unified reconstruction pipeline for high-quality 6-DoF VR content from real-world captures.

Findings

01

The dataset offers 5K resolution videos at 60 FPS for complex scenes.

02

The reconstruction framework achieves robust modeling of complex motion.

03

The pipeline produces high-quality, stable audiovisual volumetric content with large interaction spaces.

Abstract

Fully immersive experiences that tightly integrate 6-DoF visual and auditory interaction are essential for virtual and augmented reality. While such experiences can be achieved through computer-generated content, constructing them directly from real-world captured videos remains largely unexplored. We introduce Immersive Volumetric Videos, a new volumetric media format designed to provide large 6-DoF interaction spaces, audiovisual feedback, and high-resolution, high-frame-rate dynamic content. To support IVV construction, we present ImViD, a multi-view, multi-modal dataset built upon a space-oriented capture philosophy. Our custom capture rig enables synchronized multi-view video-audio acquisition during motion, facilitating efficient capture of complex indoor and outdoor scenes with rich foreground--background interactions and challenging dynamics. The dataset provides 5K-resolution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.