LOVON: Legged Open-Vocabulary Object Navigator

Daojie Peng; Jiahang Cao; Qiang Zhang; Jun Ma

arXiv:2507.06747·cs.RO·July 10, 2025

LOVON: Legged Open-Vocabulary Object Navigator

Daojie Peng, Jiahang Cao, Qiang Zhang, Jun Ma

PDF

Open Access

TL;DR

LOVON is a new framework that combines large language models with open-vocabulary visual detection to enable legged robots to perform long-range, open-world object navigation tasks effectively in dynamic environments.

Contribution

It introduces a hierarchical planning and visual detection integration, along with stabilization techniques, for robust long-range object navigation in unstructured settings.

Findings

01

Successful long-sequence task completion in real-world environments

02

Effective handling of visual jittering and target loss

03

Compatibility across multiple legged robot platforms

Abstract

Object navigation in open-world environments remains a formidable and pervasive challenge for robotic systems, particularly when it comes to executing long-horizon tasks that require both open-world object detection and high-level task planning. Traditional methods often struggle to integrate these components effectively, and this limits their capability to deal with complex, long-range navigation missions. In this paper, we propose LOVON, a novel framework that integrates large language models (LLMs) for hierarchical task planning with open-vocabulary visual detection models, tailored for effective long-range object navigation in dynamic, unstructured environments. To tackle real-world challenges including visual jittering, blind zones, and temporary target loss, we design dedicated solutions such as Laplacian Variance Filtering for visual stabilization. We also develop a functional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Social Robot Interaction and HRI