Automated Evaluation of Large Vision-Language Models on Self-driving   Corner Cases

Kai Chen; Yanze Li; Wenhua Zhang; Yanxin Liu; Pengxiang Li; Ruiyuan; Gao; Lanqing Hong; Meng Tian; Xinhai Zhao; Zhenguo Li; Dit-Yan Yeung; Huchuan; Lu; Xu Jia

arXiv:2404.10595·cs.CV·December 9, 2024·2 cites

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

Kai Chen, Yanze Li, Wenhua Zhang, Yanxin Liu, Pengxiang Li, Ruiyuan, Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan, Lu, Xu Jia

PDF

Open Access 3 Datasets

TL;DR

This paper introduces CODA-LM, a benchmark for automated evaluation of large vision-language models in self-driving corner cases, and presents CODA-VLM, a new driving LVLM that outperforms existing models and rivals GPT-4V.

Contribution

The paper proposes the first automated benchmark for evaluating LVLMs on self-driving corner cases and develops a new LVLM that surpasses open-source models and approaches GPT-4V performance.

Findings

01

CODA-LM effectively evaluates LVLMs on complex driving scenarios.

02

CODA-VLM outperforms all open-source counterparts on the benchmark.

03

CODA-VLM achieves comparable performance to GPT-4V, surpassing it in regional perception.

Abstract

Large Vision-Language Models (LVLMs) have received widespread attention for advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this work, we propose CODA-LM, the very first benchmark for the automatic evaluation of LVLMs for self-driving corner cases. We adopt a hierarchical data structure and prompt powerful LVLMs to analyze complex driving scenes and generate high-quality pre-annotations for the human annotators, while for LVLM evaluation, we show that using the text-only large language models (LLMs) as judges reveals even better alignment with human preferences than the LVLM judges. Moreover, with our CODA-LM, we build CODA-VLM, a new driving LVLM surpassing all open-sourced counterparts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsFocus