Efficient Differentiable Hardware Rasterization for 3D Gaussian Splatting

Yitian Yuan; Qianyue He

arXiv:2505.18764·cs.GR·August 14, 2025

Efficient Differentiable Hardware Rasterization for 3D Gaussian Splatting

Yitian Yuan, Qianyue He

PDF

TL;DR

This paper introduces a fast, memory-efficient differentiable hardware rasterizer for 3D Gaussian Splatting that significantly accelerates backward pass computations while maintaining low memory usage, suitable for resource-constrained devices.

Contribution

The authors develop a novel hardware rasterization method with programmable blending and hybrid gradient reduction, achieving over 10x faster backward rasterization and 3x overall speedup compared to traditional approaches.

Findings

01

Over 10x faster backward rasterization than naive atomic operations

02

3.07x acceleration in full pipeline execution on RTX4080 GPUs

03

Higher gradient accuracy with 16-bit render targets compared to float32

Abstract

Recent works demonstrate the advantages of hardware rasterization for 3D Gaussian Splatting (3DGS) in forward-pass rendering through fast GPU-optimized graphics and fixed memory footprint. However, extending these benefits to backward-pass gradient computation remains challenging due to graphics pipeline constraints. We present a differentiable hardware rasterizer for 3DGS that overcomes the memory and performance limitations of tile-based software rasterization. Our solution employs programmable blending for per-pixel gradient computation combined with a hybrid gradient reduction strategy (quad-level + subgroup) in fragment shaders, achieving over 10x faster backward rasterization versus naive atomic operations and 3x speedup over the canonical tile-based rasterizer. Systematic evaluation reveals 16-bit render targets (float16 and unorm16) as the optimal accuracy-efficiency trade-off,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.