Peering into the Unknown: Active View Selection with Neural Uncertainty Maps for 3D Reconstruction
Zhengquan Zhang, Feng Xu, Mengmi Zhang

TL;DR
This paper introduces UPNet, a neural uncertainty map predictor guiding active view selection for 3D reconstruction, achieving high accuracy with fewer viewpoints and significantly reduced computational costs.
Contribution
The paper presents a novel neural uncertainty map predictor, UPNet, for active view selection in 3D reconstruction, enabling efficient viewpoint selection without additional training.
Findings
Achieves comparable 3D reconstruction accuracy with half the viewpoints.
Reduces computational overhead by up to 400 times.
Generalizes well to new object categories without retraining.
Abstract
Some perspectives naturally provide more information than others. How can an AI system determine which viewpoint offers the most valuable insight for accurate and efficient 3D object reconstruction? Active view selection (AVS) for 3D reconstruction remains a fundamental challenge in computer vision. The aim is to identify the minimal set of views that yields the most accurate 3D reconstruction. Instead of learning radiance fields, like NeRF or 3D Gaussian Splatting, from a current observation and computing uncertainty for each candidate viewpoint, we introduce a novel AVS approach guided by neural uncertainty maps predicted by a lightweight feedforward deep neural network, named UPNet. UPNet takes a single input image of a 3D object and outputs a predicted uncertainty map, representing uncertainty values across all possible candidate viewpoints. By leveraging heuristics derived from…
Peer Reviews
Decision·ICLR 2026 Poster
This idea is very interesting: training a feed-forward uncertainty prediction network on a self-constructed dataset with ground truth supervision for AVS task. This can greatly reduce the computational resources and time required for next-best-view selection of AVS task. Experiments show that, compared with baseline AVS methods, the proposed approach exhibits clear improvements.
This paper proposes an interesting and effective method for the AVS task. However, my main concerns are the experimental evaluation and the value of this task. I find it hard to imagine a scenario where, given a single input view, the system should output where the next view should be obtained. For robotic applications, if it has the ability to move from one viewpoint to another, then it can capture dense views at arbitrary positions. For multi-view reconstruction, all ground truth captured view
* Originality: Next-best-view selection is a classic problem in 3D vision; the paper’s backbone-agnostic approach—predicting a feed-forward uncertainty/score map for view selection—is a novel angle within this space. * Quality: * The solution is simple and effective. * The evaluation is comprehensive, covering both synthetic and real scenes, with robustness tests (e.g., lighting and distance). Generalization to multiple reconstruction backbones is also tested and supports the claim. * Cla
* Uncertainty definition: The paper uses PSNR/SSIM/LPIPS/MSE as “uncertainty” labels; all are photometric and ignore geometry quality. This limits relevance for downstream tasks that depend on accurate shape. * Anchor design lacks justification: The anchor layout (48 HEALPix points) is fixed without analysis; it is unclear whether performance is bottlenecked by anchor density or discretization choice. * Candidate sampling is unclear: “Randomly sample 512 candidates” is under-specified (sphere-
1. The paper is very well-written and easy to follow. The overview figures (e.g., Figs. 1 and 2) are high-quality and provide a good overview of the proposed method and task. The structure is logical, and the claims are stated unambiguously. 2. The paper's primary strength is its computational efficiency at inference time. By replacing a full retraining loop with a single forward pass of the lightweight UPNet, the authors achieve a reported 400x speedup in selection time and massive reductions
1. The papers premise rests on the proposed ground truth uncertainty maps. However, they do not actually measure uncertainty (as do entropy or variance), but it is a map of the reconstruction error for a specific single-view reconstruction model. I would like to see a more formal analysis of why the model should be an effective policy for guiding a multi-view reconstruction, especially for a completely different model class like NeRF. 2. I also believe that the main claim regarding efficiency i
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image and Object Detection Techniques · Image Processing Techniques and Applications
