NVS-HO: A Benchmark for Novel View Synthesis of Handheld Objects
Musawar Ali, Manuel Carranza-Garc\'ia, Nicola Fioraio, Samuele Salti, and Luigi Di Stefano

TL;DR
NVS-HO is a new benchmark for evaluating novel view synthesis of handheld objects in real-world settings using RGB images, highlighting current method limitations and guiding future research.
Contribution
It introduces the first benchmark for handheld object view synthesis with real-world data and provides baseline evaluations with multiple methods.
Findings
Significant performance gaps in current methods under real-world conditions.
Benchmark facilitates development of more robust NVS approaches.
Provides a comprehensive dataset with ground-truth camera poses and images.
Abstract
We propose NVS-HO, the first benchmark designed for novel view synthesis of handheld objects in real-world environments using only RGB inputs. Each object is recorded in two complementary RGB sequences: (1) a handheld sequence, where the object is manipulated in front of a static camera, and (2) a board sequence, where the object is fixed on a ChArUco board to provide accurate camera poses via marker detection. The goal of NVS-HO is to learn a NVS model that captures the full appearance of an object from (1), whereas (2) provides the ground-truth images used for evaluation. To establish baselines, we consider both a classical SfM pipeline and a state-of-the-art pre-trained feed-forward neural network (VGGT) as pose estimators, and train NVS models based on NeRF and Gaussian Splatting. Our experiments reveal significant performance gaps in current methods under unconstrained handheld…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning · Advanced Image and Video Retrieval Techniques
