OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground Platforms

Zhongyuang Liu; Min He; Shaonan Yu; Xinhang Xu; Muqing Cao; Jianping Li; Jianfei Yang; Lihua Xie

arXiv:2603.17351·cs.RO·March 19, 2026

OmniVLN: Omnidirectional 3D Perception and Token-Efficient LLM Reasoning for Visual-Language Navigation across Air and Ground Platforms

Zhongyuang Liu, Min He, Shaonan Yu, Xinhang Xu, Muqing Cao, Jianping Li, Jianfei Yang, Lihua Xie

PDF

Open Access

TL;DR

OmniVLN introduces a zero-shot navigation framework combining omnidirectional 3D perception with hierarchical reasoning, significantly enhancing spatial understanding and navigation accuracy for aerial and ground robots in complex indoor environments.

Contribution

The paper presents OmniVLN, a novel system that fuses omnidirectional sensing with token-efficient hierarchical reasoning, enabling improved visual-language navigation without extensive prior training.

Findings

01

Navigation success improved by up to 11.68% over baseline.

02

Spatial referring accuracy increased from 77.27% to 93.18%.

03

Reduced prompt tokens by up to 61.7% in cluttered environments.

Abstract

Language-guided embodied navigation requires an agent to interpret object-referential instructions, search across multiple rooms, localize the referenced target, and execute reliable motion toward it. Existing systems remain limited in real indoor environments because narrow field-of-view sensing exposes only a partial local scene at each step, often forcing repeated rotations, delaying target discovery, and producing fragmented spatial understanding; meanwhile, directly prompting LLMs with dense 3D maps or exhaustive object lists quickly exceeds the context budget. We present OmniVLN, a zero-shot visual-language navigation framework that couples omnidirectional 3D perception with token-efficient hierarchical reasoning for both aerial and ground robots. OmniVLN fuses a rotating LiDAR and panoramic vision into a hardware-agnostic mapping stack, incrementally constructs a five-layer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Neural Network Applications