TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions
Vriksha Srihari, R. Bhavya, Shruti Jayaraman, V. Mary Anita Rajam

TL;DR
This paper introduces TexAVi, a method that generates stereoscopic VR videos from text descriptions by combining existing generative models, depth estimation, and image processing to create immersive virtual reality content.
Contribution
It presents a novel pipeline that integrates text-to-image models, depth estimation, and stereoscopic rendering to produce VR videos from textual input, addressing data scarcity and realism challenges.
Findings
Generated VR frames with high visual quality assessed by Fréchet Inception Distance and CLIP Score.
Demonstrated the feasibility of text-driven stereoscopic VR video creation.
Showcased potential applications in virtual reality production and simulation.
Abstract
While generative models such as text-to-image, large language models and text-to-video have seen significant progress, the extension to text-to-virtual-reality remains largely unexplored, due to a deficit in training data and the complexity of achieving realistic depth and motion in virtual environments. This paper proposes an approach to coalesce existing generative systems to form a stereoscopic virtual reality video from text. Carried out in three main stages, we start with a base text-to-image model that captures context from an input text. We then employ Stable Diffusion on the rudimentary image produced, to generate frames with enhanced realism and overall quality. These frames are processed with depth estimation algorithms to create left-eye and right-eye views, which are stitched side-by-side to create an immersive viewing experience. Such systems would be highly beneficial in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion · Balanced Selection · Contrastive Language-Image Pre-training
