PhysNav-DG: A Novel Adaptive Framework for Robust VLM-Sensor Fusion in Navigation Applications
Trisanth Srinivasan, Santosh Patapati

TL;DR
PhysNav-DG is an innovative framework combining sensor fusion with vision-language models, enhancing navigation accuracy and explainability across diverse environments through a new benchmark and adaptive filtering.
Contribution
It introduces PhysNav-DG, a dual-branch architecture integrating classical sensor fusion with semantic models, and the MD-NEX Benchmark for multi-domain navigation evaluation.
Findings
Navigation success rates improved by over 20%
High grounding and clarity of explanations achieved
Adaptive Kalman Filter enhances environmental adaptability
Abstract
Robust navigation in diverse environments and domains requires both accurate state estimation and transparent decision making. We present PhysNav-DG, a novel framework that integrates classical sensor fusion with the semantic power of vision-language models. Our dual-branch architecture predicts navigation actions from multi-sensor inputs while simultaneously generating detailed chain-of-thought explanations. A modified Adaptive Kalman Filter dynamically adjusts its noise parameters based on environmental context. It leverages several streams of raw sensor data along with semantic insights from models such as LLaMA 3.2 11B and BLIP-2. To evaluate our approach, we introduce the MD-NEX Benchmark, a novel multi-domain dataset that unifies indoor navigation, autonomous driving, and social navigation tasks with ground-truth actions and human-validated explanations. Extensive experiments and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInertial Sensor and Navigation · Target Tracking and Data Fusion in Sensor Networks
MethodsLLaMA
