COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems

Shuang Ma; Sai Vemprala; Wenshan Wang; Jayesh K. Gupta; Yale Song,; Daniel McDuff; Ashish Kapoor

arXiv:2203.15788·cs.RO·March 30, 2022

COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems

Shuang Ma, Sai Vemprala, Wenshan Wang, Jayesh K. Gupta, Yale Song,, Daniel McDuff, Ashish Kapoor

PDF

Open Access

TL;DR

COMPASS is a novel multimodal pretraining framework that constructs a graph-based representation to learn generalizable state representations for autonomous systems across various tasks and environments.

Contribution

It introduces a general-purpose pretraining pipeline that leverages multimodal graphs and latent space factorization to improve autonomous system representations.

Findings

01

Pretrained on TartanAir dataset, COMPASS performs well on drone navigation, vehicle racing, and visual odometry.

02

COMPASS generalizes to unseen environments and real-world data.

03

It effectively models temporal dynamics, geometry, and semantics in multimodal data.

Abstract

Learning representations that generalize across tasks and domains is challenging yet necessary for autonomous systems. Although task-driven approaches are appealing, designing models specific to each application can be difficult in the face of limited data, especially when dealing with highly variable multimodal input spaces arising from different tasks in different environments.We introduce the first general-purpose pretraining pipeline, COntrastive Multimodal Pretraining for AutonomouS Systems (COMPASS), to overcome the limitations of task-specific models and existing pretraining approaches. COMPASS constructs a multimodal graph by considering the essential information for autonomous systems and the properties of different modalities. Through this graph, multimodal signals are connected and mapped into two factorized spatio-temporal latent spaces: a "motion pattern space" and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques