Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI

Chengyuan Xu; Radha Kumaran; Noah Stier; Kangyou Yu; Tobias H\"ollerer

arXiv:2410.04652·cs.HC·October 8, 2024

Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI

Chengyuan Xu, Radha Kumaran, Noah Stier, Kangyou Yu, Tobias H\"ollerer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multimodal 3D representation combining semantic, linguistic, and geometric data, enabling in-situ machine learning for AR applications like spatial search and inventory management.

Contribution

It presents a novel multimodal 3D reconstruction pipeline and in-situ learning framework that integrate vision-language features for enhanced AR environment understanding.

Findings

01

Effective fusion of CLIP features into 3D models.

02

Successful demonstration of spatial search and inventory tracking in AR.

03

Open-source implementation and demo data provided.

Abstract

Seamless integration of virtual and physical worlds in augmented reality benefits from the system semantically "understanding" the physical environment. AR research has long focused on the potential of context awareness, demonstrating novel capabilities that leverage the semantics in the 3D environment for various object-level interactions. Meanwhile, the computer vision community has made leaps in neural vision-language understanding to enhance environment perception for autonomous tasks. In this work, we introduce a multimodal 3D object representation that unifies both semantic and linguistic knowledge with the geometric representation, enabling user-guided machine learning involving physical objects. We first present a fast multimodal 3D reconstruction pipeline that brings linguistic understanding to AR by fusing CLIP vision-language features into the environment and object models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cy-xu/spatially_aware_ai
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Video Surveillance and Tracking Methods

MethodsContrastive Language-Image Pre-training · Attentive Walk-Aggregating Graph Neural Network