AD-H: Autonomous Driving with Hierarchical Agents
Zaibin Zhang, Shiyu Tang, Yuanhang Zhang, Talas Fu, Yifan Wang, Yang, Liu, Dong Wang, Jing Shao, Lijun Wang, Huchuan Lu

TL;DR
AD-H introduces a hierarchical multi-agent system for autonomous driving that leverages large language models for high-level reasoning and a lightweight controller for low-level control, enhancing performance and generalization.
Contribution
The paper presents a novel hierarchical architecture connecting high-level language-driven commands with low-level control, improving the generalizability and performance of autonomous driving systems using MLLMs.
Findings
Outperforms state-of-the-art methods in driving tasks
Exhibits self-correction during vehicle operation
Shows superior generalization to new environments
Abstract
Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly translate high-level instructions into low-level vehicle control signals, which deviates from the inherent language generation paradigm of MLLMs and fails to fully harness their emergent powers. As a result, the generalizability of these methods is highly restricted by autonomous driving datasets used during fine-tuning. To tackle this challenge, we propose to connect high-level instructions and low-level control signals with mid-level language-driven commands, which are more fine-grained than high-level instructions but more universal and explainable than control signals, and thus can effectively bridge the gap in between. We implement this…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
There are existing works that applies MLLMs to autonomous driving and the hierarchical MLLM agent paradigm introduced by this paper is similar to the chain-of-thoughts technique in the LLM area. This paper is original in that it brings this chain-of-thoughts technique to MLLM based autonomous driving and shows advantage in generalization ability and driving behavior. Overall the paper is easy to follow and clear written.
- The author did not demonstrate failure cases. On the LangAuto benchmark, the best route complete rate is 53.2%, what are the cases where it did not complete?
1. From their statements, we can agree that the authors have made tremendous efforts to build a large autonomous driving dataset with multi-level driving command annotations to evaluate the model. 2. The ablation study of the experiment has been carefully designed to investigate influences from the training datasets and the different downstream controllers.
In short, the paper, overall, makes the most minuscule contribution to society. Here are some key problems in the fundamental logic and their experiment design: 1. The authors suggest that by decoupling the perception-planning and controlling stage, one can better leverage LLMs' *emergent capabilities*. However, their justifications are rather feeble - the hierarchy helps free the LLM from generating low-level control signal decoding and focus on perceptive reasoning and planning. From a methodo
1. **Conceptual Simplicity and Design Insight**: The hierarchical agent framework, separating the MLLM planner and the controller, is a conceptually straightforward and interesting approach. The design highlights MLLMs’ strengths in high-level reasoning while minimizing their drawbacks in directly outputting low-level action. 2. **Dataset Contribution**: The new dataset with hierarchical action annotations is a valuable resource for advancing autonomous driving research, especially in tasks req
1. **Evaluation of Consistency between Language and Action**: While the hierarchical framework is well-conceived, further evaluation of how consistent the language command and executed actions are, would be beneficial. In other words, do the final control actions comply with the mid-level commands? This is essential for fully validating the model’s real-world safety and applicability. 2. **Labeling of Mid-Level Instructions**: While the authors clearly introduced the hierarchical design, the me
1. The paper is clearly presented, easy to understand. 2. AD-H fully leverages the MLLM’s capabilities in perception, reasoning, and planning by focusing on generating mid-level commands rather than direct control signals, which maximizes the utility of the MLLM in autonomous driving. 3. The proposed system demonstrates impressive performance, including self-correction during operation, and outperforms current state-of-the-art methods in generalization, especially in new or complex driving scena
1. The innovation of hierarchical planning is limited: DriveMLM has investigated the LLM-based mid-level lateral and longitudinal decision commands by setting decision states in Carla. 2. This paper uses LLaVA-7B and OPT-350M as the high-level planner and low-level controller, respectively. Can the 7B planner provide accurate mid-level instructions? Can the 350M controller output a well-planned trajectory based on ground truth instructions? 3. Separate evaluation experiments for each module sho
- The hierarchical decomposition of driving tasks into high-level planning and low-level control is well-motivated and effectively leverages the strengths of MLLMs - The mid-level driving commands provide an elegant interface between language understanding and vehicle control - Clear and well-organized presentation - Creation of a new dataset with hierarchical annotations
### Limited baseline comparisons: - Should include more recent works like CarLLaVA and other hierarchical approaches ### Insufficient ablation studies: - No analysis of impact of different granularities of mid-level commands ### Limited novelty The core idea of hierarchical decomposition is relatively straightforward and similar approaches have been explored in diverse domains, including autonomous driving and robotics control. ### The experimental validation could be more comprehensive: - On
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Transportation and Mobility Innovations · Autonomous Vehicle Technology and Safety
