A Vision-Language-Action Model with Visual Prompt for OFF-Road Autonomous Driving
Liangdong Zhang, Yiming Nie, Haoyang Li, Fanjie Kong, Baobao Zhang, Shunxin Huang, Kai Fu, Chen Min, Liang Xiao

TL;DR
This paper introduces OFF-EMMA, an end-to-end multimodal framework with visual prompts and a chain-of-thought reasoning strategy, significantly improving off-road autonomous vehicle trajectory planning accuracy and robustness.
Contribution
The paper proposes a novel visual prompt block and COT-SC reasoning strategy to enhance spatial perception and reasoning in off-road autonomous driving models.
Findings
Outperforms existing methods on RELLIS-3D dataset
Reduces average L2 error by 13.3%
Decreases failure rate from 16.52% to 6.56%
Abstract
Efficient trajectory planning in off-road terrains presents a formidable challenge for autonomous vehicles, often necessitating complex multi-step pipelines. However, traditional approaches exhibit limited adaptability in dynamic environments. To address these limitations, this paper proposes OFF-EMMA, a novel end-to-end multimodal framework designed to overcome the deficiencies of insufficient spatial perception and unstable reasoning in visual-language-action (VLA) models for off-road autonomous driving scenarios. The framework explicitly annotates input images through the design of a visual prompt block and introduces a chain-of-thought with self-consistency (COT-SC) reasoning strategy to enhance the accuracy and robustness of trajectory planning. The visual prompt block utilizes semantic segmentation masks as visual prompts, enhancing the spatial understanding ability of pre-trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Robotic Path Planning Algorithms
