Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting
Yoonwoo Jeong, Cheng Sun, Frank Wang, Minsu Cho, Jaesung Choe

TL;DR
This paper introduces Quantile Rendering, a novel method for efficiently rendering high-dimensional features in 3D Gaussian Splatting, significantly improving speed and quality for open-vocabulary segmentation tasks.
Contribution
We propose Quantile Rendering (Q-Render), a new sparse sampling strategy for high-dimensional features in 3D Gaussian Splatting, and develop GS-Net for generalizable Gaussian feature prediction.
Findings
Outperforms state-of-the-art methods on ScanNet and LeRF datasets.
Achieves approximately 43.7x speedup in rendering with high-dimensional features.
Maintains high segmentation quality with efficient sparse sampling.
Abstract
Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS). Despite this progress, efficiently rendering the high-dimensional features required for open-vocabulary queries poses a significant challenge. Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality. To address this limitation, we introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity. Unlike conventional volume rendering, which densely samples all 3D Gaussians intersecting each ray, Q-Render sparsely samples only those with dominant influence along the ray. By integrating Q-Render into a generalizable 3D neural network, we also propose Gaussian…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper proposes a quantile sampling strategy that identifies critical Gaussians through transmittance analysis, which is theoretically motivated and practically effective. 2. The method demonstrates superior performance and efficiency on both ScanNet and LeRF-OVS open-vocabulary 3D semantic segmentation benchmarks, achieving ~43.7× speed gains when rendering 512-D feature maps.
1. The paper lacks theoretical analysis of Q-Render's approximation error to volume rendering or convergence guarantees. The mathematical justification for the normalization operation (line 20 in Algorithm 1) is insufficient. 2. While K significantly impacts performance, the paper lacks an adaptive strategy for selecting K. Why is uniform partitioning (k+1)/(K+1) chosen? Have adaptive thresholds been considered? 3. The paper does not sufficiently analyze when Q-Render might fail or how it perf
The idea is interesting and well-motivated. The performances shown in the experiments are good.
1. Paper presentation: In the abstract, the GS-Net and open-vocabulary 3D semantic segmentation are mentioned, without introducing their relationship with the Q-Render, making it hard to follow. The topic of the paper is unclear. If the proposed method is designed for general high-dimensional feature rendering, why is it only evaluated on the 3D open-vocabulary semantic segmentation task? If the method is specifically designed for 3D open-vocabulary semantic segmentation, then there is a lack of
1. The authors evaluate on two large-scale benchmarks with extensive ablations, qualitative visualization, and speed analyses. 2. The approach achieves >40× rendering speedup for 512-D features while improving mIoU—impressive for real-world scalability. 3. Bridges 2D foundation models (CLIP, SAM) with 3D Gaussian representations—a timely and valuable direction for the ICLR community.
1. The quantile sampling justification is intuitive but lacks a quantitative analysis of approximation error relative to volume rendering. 2. Although ablations are provided, an adaptive or learned K would make the method more robust and generalizable. 3. The paper primarily focuses on indoor datasets (ScanNet, LeRF-OVS). Outdoor or multi-view generalization tests would strengthen the claim of scalability. 4. Only MinkUNet and PTv3 are explored. An analysis of architecture-agnostic behavior w
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
