LOVON: Legged Open-Vocabulary Object Navigator
Daojie Peng, Jiahang Cao, Qiang Zhang, Jun Ma

TL;DR
LOVON is a new framework that combines large language models with open-vocabulary visual detection to enable legged robots to perform long-range, open-world object navigation tasks effectively in dynamic environments.
Contribution
It introduces a hierarchical planning and visual detection integration, along with stabilization techniques, for robust long-range object navigation in unstructured settings.
Findings
Successful long-sequence task completion in real-world environments
Effective handling of visual jittering and target loss
Compatibility across multiple legged robot platforms
Abstract
Object navigation in open-world environments remains a formidable and pervasive challenge for robotic systems, particularly when it comes to executing long-horizon tasks that require both open-world object detection and high-level task planning. Traditional methods often struggle to integrate these components effectively, and this limits their capability to deal with complex, long-range navigation missions. In this paper, we propose LOVON, a novel framework that integrates large language models (LLMs) for hierarchical task planning with open-vocabulary visual detection models, tailored for effective long-range object navigation in dynamic, unstructured environments. To tackle real-world challenges including visual jittering, blind zones, and temporary target loss, we design dedicated solutions such as Laplacian Variance Filtering for visual stabilization. We also develop a functional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Social Robot Interaction and HRI
