Efficient Differentiable Hardware Rasterization for 3D Gaussian Splatting
Yitian Yuan, Qianyue He

TL;DR
This paper introduces a fast, memory-efficient differentiable hardware rasterizer for 3D Gaussian Splatting that significantly accelerates backward pass computations while maintaining low memory usage, suitable for resource-constrained devices.
Contribution
The authors develop a novel hardware rasterization method with programmable blending and hybrid gradient reduction, achieving over 10x faster backward rasterization and 3x overall speedup compared to traditional approaches.
Findings
Over 10x faster backward rasterization than naive atomic operations
3.07x acceleration in full pipeline execution on RTX4080 GPUs
Higher gradient accuracy with 16-bit render targets compared to float32
Abstract
Recent works demonstrate the advantages of hardware rasterization for 3D Gaussian Splatting (3DGS) in forward-pass rendering through fast GPU-optimized graphics and fixed memory footprint. However, extending these benefits to backward-pass gradient computation remains challenging due to graphics pipeline constraints. We present a differentiable hardware rasterizer for 3DGS that overcomes the memory and performance limitations of tile-based software rasterization. Our solution employs programmable blending for per-pixel gradient computation combined with a hybrid gradient reduction strategy (quad-level + subgroup) in fragment shaders, achieving over 10x faster backward rasterization versus naive atomic operations and 3x speedup over the canonical tile-based rasterizer. Systematic evaluation reveals 16-bit render targets (float16 and unorm16) as the optimal accuracy-efficiency trade-off,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
