Accelerating 3D Gaussian Splatting using Tensor Cores
Sheng Li, Yang Sui, Yue Wu, Zhuoran Song, Bo Yuan, Xulong Tang, Yue Dai

TL;DR
TensorGS accelerates 3D Gaussian Splatting rendering by transforming rasterization into Tensor Core-compatible matrix operations, achieving 1.65× speedup with minimal quality loss.
Contribution
The paper introduces TensorGS, a novel framework that enables Tensor Core acceleration for 3D Gaussian Splatting rasterization, improving performance significantly.
Findings
TensorGS achieves 1.65× faster rendering performance.
Rasterization in FP16 maintains negligible quality degradation.
Cross-tile grouping enhances Gaussian reuse and Tensor Core utilization.
Abstract
3D Gaussian Splatting (3DGS) has become a leading technique for real-time neural rendering and 3D scene reconstruction, but its rendering cost remains too high for many latency-sensitive scenarios. In particular, the rasterization stage in 3DGS dominates end-to-end rendering time, during which the renderer repeatedly evaluates each Gaussian's contribution to each covered pixel, making this stage compute-bound. At the same time, modern GPUs provide high-throughput Tensor Cores for low-precision matrix operations, yet existing 3DGS systems execute rasterization entirely on CUDA cores and leave Tensor Cores idle. We find that 3DGS rendering can be executed in FP16 with negligible quality degradation, suggesting a promising opportunity for Tensor Core acceleration. However, exploiting Tensor Cores for 3DGS is non-trivial because rasterization does not naturally match their execution model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
