Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and   Reasoning

Enna Sachdeva; Nakul Agarwal; Suhas Chundi; Sean Roelofs; Jiachen Li,; Mykel Kochenderfer; Chiho Choi; Behzad Dariush

arXiv:2309.06597·cs.CV·November 9, 2023

Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning

Enna Sachdeva, Nakul Agarwal, Suhas Chundi, Sean Roelofs, Jiachen Li,, Mykel Kochenderfer, Chiho Choi, Behzad Dariush

PDF

Open Access 1 Video

TL;DR

This paper presents Rank2Tell, a comprehensive multimodal dataset for autonomous vehicle scene understanding, and introduces a joint model for importance ranking and explanation generation, advancing interpretability in autonomous driving systems.

Contribution

The paper introduces Rank2Tell, a novel multi-modal dataset with dense annotations for importance ranking and reasoning, and proposes a joint model for importance ranking and caption generation.

Findings

01

The dataset enables detailed semantic and relational scene understanding.

02

The joint model achieves promising results in importance ranking and captioning tasks.

03

Rank2Tell facilitates research on interpretability and trustworthiness in autonomous driving.

Abstract

The widespread adoption of commercial autonomous vehicles (AVs) and advanced driver assistance systems (ADAS) may largely depend on their acceptance by society, for which their perceived trustworthiness and interpretability to riders are crucial. In general, this task is challenging because modern autonomous systems software relies heavily on black-box artificial intelligence models. Towards this goal, this paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance. Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios. The dense annotations and unique attributes of the dataset make it a valuable resource for researchers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition