VVGT: Visual Volume-Grounded Transformer
Yuxuan Wang, Qibiao Li, Youcheng Cai

TL;DR
VVGT introduces a fast, scalable, and accurate volumetric visualization method using a transformer-based framework that directly maps data to 3D Gaussian splatting, surpassing traditional DVR and existing 3DGS approaches.
Contribution
The paper proposes VVGT, a novel feed-forward, volumetric representation framework employing a dual-transformer and epipolar attention, eliminating the need for per-scene optimization and enabling real-time visualization.
Findings
Achieves high-quality visualization with faster conversion times.
Demonstrates improved geometric consistency and zero-shot generalization.
Enables interactive and scalable volumetric visualization across diverse datasets.
Abstract
Volumetric visualization has long been dominated by Direct Volume Rendering (DVR), which operates on dense voxel grids and suffers from limited scalability as resolution and interactivity demands increase. Recent advances in 3D Gaussian Splatting (3DGS) offer a representation-centric alternative; however, existing volumetric extensions still depend on costly per-scene optimization, limiting scalability and interactivity. We present VVGT (Visual Volume-Grounded Transformer), a feed-forward, representation-first framework that directly maps volumetric data to a 3D Gaussian Splatting representation, advancing a new paradigm for volumetric visualization beyond DVR. Unlike prior feed-forward 3DGS methods designed for surface-centric reconstruction, VVGT explicitly accounts for volumetric rendering, where each pixel aggregates contributions along a ray. VVGT employs a dual-transformer network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
