ADAPT: Action-aware Driving Caption Transformer
Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang,, Yuhang Zheng, Guyue Zhou, Jingjing Liu

TL;DR
ADAPT is a transformer-based model that generates natural language explanations for autonomous driving decisions, improving transparency and interpretability in self-driving systems.
Contribution
The paper introduces ADAPT, a novel end-to-end transformer architecture that jointly predicts driving actions and provides human-readable explanations, advancing explainability in autonomous driving.
Findings
Achieves state-of-the-art performance on BDD-X dataset.
Provides real-time action narrations and reasoning.
Outperforms existing methods in automatic and human evaluations.
Abstract
End-to-end autonomous driving has great potential in the transportation industry. However, the lack of transparency and interpretability of the automatic decision-making process hinders its industrial adoption in practice. There have been some early attempts to use attention maps or cost volume for better model explainability which is difficult for ordinary passengers to understand. To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action. ADAPT jointly trains both the driving caption task and the vehicular control prediction task, through a shared video representation. Experiments on BDD-X (Berkeley DeepDrive eXplanation) dataset demonstrate state-of-the-art performance of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Explainable Artificial Intelligence (XAI)
