Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
Enna Sachdeva, Nakul Agarwal, Suhas Chundi, Sean Roelofs, Jiachen Li,, Mykel Kochenderfer, Chiho Choi, Behzad Dariush

TL;DR
This paper presents Rank2Tell, a comprehensive multimodal dataset for autonomous vehicle scene understanding, and introduces a joint model for importance ranking and explanation generation, advancing interpretability in autonomous driving systems.
Contribution
The paper introduces Rank2Tell, a novel multi-modal dataset with dense annotations for importance ranking and reasoning, and proposes a joint model for importance ranking and caption generation.
Findings
The dataset enables detailed semantic and relational scene understanding.
The joint model achieves promising results in importance ranking and captioning tasks.
Rank2Tell facilitates research on interpretability and trustworthiness in autonomous driving.
Abstract
The widespread adoption of commercial autonomous vehicles (AVs) and advanced driver assistance systems (ADAS) may largely depend on their acceptance by society, for which their perceived trustworthiness and interpretability to riders are crucial. In general, this task is challenging because modern autonomous systems software relies heavily on black-box artificial intelligence models. Towards this goal, this paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance. Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios. The dense annotations and unique attributes of the dataset make it a valuable resource for researchers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
