PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

Trisanth Srinivasan; Santosh Patapati

arXiv:2505.01881·cs.CV·June 16, 2025

PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications

Trisanth Srinivasan, Santosh Patapati

PDF

Open Access

TL;DR

PhysNav-DG is an innovative framework combining sensor fusion with vision-language models, enhancing navigation accuracy and explainability across diverse environments through a new benchmark and adaptive filtering.

Contribution

It introduces PhysNav-DG, a dual-branch architecture integrating classical sensor fusion with semantic models, and the MD-NEX Benchmark for multi-domain navigation evaluation.

Findings

01

Navigation success rates improved by over 20%

02

High grounding and clarity of explanations achieved

03

Adaptive Kalman Filter enhances environmental adaptability

Abstract

Robust navigation in diverse environments and domains requires both accurate state estimation and transparent decision making. We present PhysNav-DG, a novel framework that integrates classical sensor fusion with the semantic power of vision-language models. Our dual-branch architecture predicts navigation actions from multi-sensor inputs while simultaneously generating detailed chain-of-thought explanations. A modified Adaptive Kalman Filter dynamically adjusts its noise parameters based on environmental context. It leverages several streams of raw sensor data along with semantic insights from models such as LLaMA 3.2 11B and BLIP-2. To evaluate our approach, we introduce the MD-NEX Benchmark, a novel multi-domain dataset that unifies indoor navigation, autonomous driving, and social navigation tasks with ground-truth actions and human-validated explanations. Extensive experiments and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInertial Sensor and Navigation · Target Tracking and Data Fusion in Sensor Networks

MethodsLLaMA