OLiVia-Nav: An Online Lifelong Vision Language Approach for Mobile Robot   Social Navigation

Siddarth Narasimhan; Aaron Hao Tan; Daniel Choi; Goldie Nejat

arXiv:2409.13675·cs.RO·March 11, 2025

OLiVia-Nav: An Online Lifelong Vision Language Approach for Mobile Robot Social Navigation

Siddarth Narasimhan, Aaron Hao Tan, Daniel Choi, Goldie Nejat

PDF

Open Access

TL;DR

OLiVia-Nav is a novel online lifelong vision language framework that enables social navigation for robots by integrating large vision-language models with continual learning, improving adaptability and social compliance in dynamic environments.

Contribution

The paper introduces SC-CLIP, a distillation method that transfers social reasoning from large VLMs to lightweight models, enabling real-time social navigation adaptation.

Findings

01

OLiVia-Nav outperforms state-of-the-art methods in social navigation metrics.

02

The approach effectively encodes social and environmental context during navigation.

03

Ablation studies confirm the importance of each component in the system.

Abstract

Service robots in human-centered environments such as hospitals, office buildings, and long-term care homes need to navigate while adhering to social norms to ensure the safety and comfortability of the people they are sharing the space with. Furthermore, they need to adapt to new social scenarios that can arise during robot navigation. In this paper, we present a novel Online Lifelong Vision Language architecture, OLiVia- Nav, which uniquely integrates vision-language models (VLMs) with an online lifelong learning framework for robot social navigation. We introduce a unique distillation approach, Social Context Contrastive Language Image Pre-training (SC-CLIP), to transfer the social reasoning capabilities of large VLMs to a lightweight VLM, in order for OLiVia-Nav to directly encode social and environment context during robot navigation. These encoded embeddings are used to generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robotics and Automated Systems