ADAPT: Action-aware Driving Caption Transformer

Bu Jin; Xinyu Liu; Yupeng Zheng; Pengfei Li; Hao Zhao; Tong Zhang,; Yuhang Zheng; Guyue Zhou; Jingjing Liu

arXiv:2302.00673·cs.CV·February 2, 2023

ADAPT: Action-aware Driving Caption Transformer

Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang,, Yuhang Zheng, Guyue Zhou, Jingjing Liu

PDF

Open Access 1 Repo

TL;DR

ADAPT is a transformer-based model that generates natural language explanations for autonomous driving decisions, improving transparency and interpretability in self-driving systems.

Contribution

The paper introduces ADAPT, a novel end-to-end transformer architecture that jointly predicts driving actions and provides human-readable explanations, advancing explainability in autonomous driving.

Findings

01

Achieves state-of-the-art performance on BDD-X dataset.

02

Provides real-time action narrations and reasoning.

03

Outperforms existing methods in automatic and human evaluations.

Abstract

End-to-end autonomous driving has great potential in the transportation industry. However, the lack of transparency and interpretability of the automatic decision-making process hinders its industrial adoption in practice. There have been some early attempts to use attention maps or cost volume for better model explainability which is difficult for ordinary passengers to understand. To bridge the gap, we propose an end-to-end transformer-based architecture, ADAPT (Action-aware Driving cAPtion Transformer), which provides user-friendly natural language narrations and reasoning for each decision making step of autonomous vehicular control and action. ADAPT jointly trains both the driving caption task and the vehicular control prediction task, through a shared video representation. Experiments on BDD-X (Berkeley DeepDrive eXplanation) dataset demonstrate state-of-the-art performance of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jxbbb/adapt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Explainable Artificial Intelligence (XAI)