HERMES: A Holistic End-to-End Risk-Aware Multimodal Embodied System with Vision-Language Models for Long-Tail Autonomous Driving
Weizhe Tang, Junwei You, Jiaxi Liu, Zhaoyi Wang, Rui Gan, Zilin Huang, Feng Wei, Bin Ran

TL;DR
HERMES is a comprehensive autonomous driving system that integrates risk-aware multimodal perception and planning, specifically designed to handle complex long-tail scenarios involving diverse road users, improving safety and accuracy.
Contribution
This paper introduces HERMES, a novel framework that incorporates explicit long-tail risk cues and a tri-modal perception module for safer, more reliable autonomous driving in complex scenarios.
Findings
HERMES outperforms baseline models in long-tail mixed-traffic scenarios.
Structured long-tail scene and planning contexts improve risk-awareness.
Ablation studies confirm the effectiveness of key components.
Abstract
End-to-end autonomous driving models increasingly benefit from large vision--language models for semantic understanding, yet ensuring safe and accurate operation under long-tail conditions remains challenging. These challenges are particularly prominent in long-tail mixed-traffic scenarios, where autonomous vehicles must interact with heterogeneous road users, including human-driven vehicles and vulnerable road users, under complex and uncertain conditions. This paper proposes HERMES, a holistic risk-aware end-to-end multimodal driving framework designed to inject explicit long-tail risk cues into trajectory planning. HERMES employs a foundation-model-assisted annotation pipeline to produce structured Long-Tail Scene Context and Long-Tail Planning Context, capturing hazard-centric cues together with maneuver intent and safety preference, and uses these signals to guide end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Robotic Path Planning Algorithms · Multimodal Machine Learning Applications
