Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian   Splatting

Matthew Strong; Boshu Lei; Aiden Swann; Wen Jiang; Kostas Daniilidis,; Monroe Kennedy III

arXiv:2410.04680·cs.RO·March 11, 2025

Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

Matthew Strong, Boshu Lei, Aiden Swann, Wen Jiang, Kostas Daniilidis,, Monroe Kennedy III

PDF

Open Access 1 Repo

TL;DR

This paper introduces an active view and touch selection framework for robotic scene understanding using 3D Gaussian Splatting, improving performance in limited-view scenarios through novel training and selection methods.

Contribution

It presents an end-to-end online training pipeline with a new semantic depth alignment and extends FisherRF for active view and touch selection in 3DGS.

Findings

01

Enhanced 3D scene reconstruction in few-view settings

02

Improved view and touch pose selection accuracy

03

Demonstrated real-time performance on a robotic system

Abstract

We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

armlabstanford/NextBestSense
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInteractive and Immersive Displays · Augmented Reality Applications · Robotics and Sensor-Based Localization