Where to Fetch: Extracting Visual Scene Representation from Large Pre-Trained Models for Robotic Goal Navigation
Yu Li, Dayou Li, Chenkun Zhao, Ruifeng Wang, Ran Song, Wei Zhang

TL;DR
This paper introduces a visual scene representation derived from large-scale visual language models, enabling robots to interpret natural language instructions and navigate complex environments effectively.
Contribution
The work presents a novel scene representation method that integrates visual language models with large language models for improved robotic goal navigation.
Findings
Enables robots to follow diverse natural language instructions.
Improves environment understanding for goal-directed navigation.
Demonstrates successful complex task completion in experiments.
Abstract
To complete a complex task where a robot navigates to a goal object and fetches it, the robot needs to have a good understanding of the instructions and the surrounding environment. Large pre-trained models have shown capabilities to interpret tasks defined via language descriptions. However, previous methods attempting to integrate large pre-trained models with daily tasks are not competent in many robotic goal navigation tasks due to poor understanding of the environment. In this work, we present a visual scene representation built with large-scale visual language models to form a feature representation of the environment capable of handling natural language queries. Combined with large language models, this method can parse language instructions into action sequences for a robot to follow, and accomplish goal navigation with querying the scene representation. Experiments demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Robotic Path Planning Algorithms
