Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

Yoonwoo Jeong; Cheng Sun; Frank Wang; Minsu Cho; Jaesung Choe

arXiv:2512.20927·cs.CV·December 25, 2025

Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

Yoonwoo Jeong, Cheng Sun, Frank Wang, Minsu Cho, Jaesung Choe

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Quantile Rendering, a novel method for efficiently rendering high-dimensional features in 3D Gaussian Splatting, significantly improving speed and quality for open-vocabulary segmentation tasks.

Contribution

We propose Quantile Rendering (Q-Render), a new sparse sampling strategy for high-dimensional features in 3D Gaussian Splatting, and develop GS-Net for generalizable Gaussian feature prediction.

Findings

01

Outperforms state-of-the-art methods on ScanNet and LeRF datasets.

02

Achieves approximately 43.7x speedup in rendering with high-dimensional features.

03

Maintains high segmentation quality with efficient sparse sampling.

Abstract

Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS). Despite this progress, efficiently rendering the high-dimensional features required for open-vocabulary queries poses a significant challenge. Existing methods employ codebooks or feature compression, causing information loss, thereby degrading segmentation quality. To address this limitation, we introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity. Unlike conventional volume rendering, which densely samples all 3D Gaussians intersecting each ray, Q-Render sparsely samples only those with dominant influence along the ray. By integrating Q-Render into a generalizable 3D neural network, we also propose Gaussian…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper proposes a quantile sampling strategy that identifies critical Gaussians through transmittance analysis, which is theoretically motivated and practically effective. 2. The method demonstrates superior performance and efficiency on both ScanNet and LeRF-OVS open-vocabulary 3D semantic segmentation benchmarks, achieving ~43.7× speed gains when rendering 512-D feature maps.

Weaknesses

1. The paper lacks theoretical analysis of Q-Render's approximation error to volume rendering or convergence guarantees. The mathematical justification for the normalization operation (line 20 in Algorithm 1) is insufficient. 2. While K significantly impacts performance, the paper lacks an adaptive strategy for selecting K. Why is uniform partitioning (k+1)/(K+1) chosen? Have adaptive thresholds been considered? 3. The paper does not sufficiently analyze when Q-Render might fail or how it perf

Reviewer 02Rating 4Confidence 4

Strengths

The idea is interesting and well-motivated. The performances shown in the experiments are good.

Weaknesses

1. Paper presentation: In the abstract, the GS-Net and open-vocabulary 3D semantic segmentation are mentioned, without introducing their relationship with the Q-Render, making it hard to follow. The topic of the paper is unclear. If the proposed method is designed for general high-dimensional feature rendering, why is it only evaluated on the 3D open-vocabulary semantic segmentation task? If the method is specifically designed for 3D open-vocabulary semantic segmentation, then there is a lack of

Reviewer 03Rating 4Confidence 4

Strengths

1. The authors evaluate on two large-scale benchmarks with extensive ablations, qualitative visualization, and speed analyses. 2. The approach achieves >40× rendering speedup for 512-D features while improving mIoU—impressive for real-world scalability. 3. Bridges 2D foundation models (CLIP, SAM) with 3D Gaussian representations—a timely and valuable direction for the ICLR community.

Weaknesses

1. The quantile sampling justification is intuitive but lacks a quantitative analysis of approximation error relative to volume rendering. 2. Although ablations are provided, an adaptive or learned K would make the method more robust and generalizable. 3. The paper primarily focuses on indoor datasets (ScanNet, LeRF-OVS). Outdoor or multi-view generalization tests would strengthen the claim of scalability. 4. Only MinkUNet and PTv3 are explored. An analysis of architecture-agnostic behavior w

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques