Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases
Kai Chen, Yanze Li, Wenhua Zhang, Yanxin Liu, Pengxiang Li, Ruiyuan, Gao, Lanqing Hong, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan, Lu, Xu Jia

TL;DR
This paper introduces CODA-LM, a benchmark for automated evaluation of large vision-language models in self-driving corner cases, and presents CODA-VLM, a new driving LVLM that outperforms existing models and rivals GPT-4V.
Contribution
The paper proposes the first automated benchmark for evaluating LVLMs on self-driving corner cases and develops a new LVLM that surpasses open-source models and approaches GPT-4V performance.
Findings
CODA-LM effectively evaluates LVLMs on complex driving scenarios.
CODA-VLM outperforms all open-source counterparts on the benchmark.
CODA-VLM achieves comparable performance to GPT-4V, surpassing it in regional perception.
Abstract
Large Vision-Language Models (LVLMs) have received widespread attention for advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on multi-faceted capabilities in natural circumstances, lacking automated and quantifiable assessment for self-driving, let alone the severe road corner cases. In this work, we propose CODA-LM, the very first benchmark for the automatic evaluation of LVLMs for self-driving corner cases. We adopt a hierarchical data structure and prompt powerful LVLMs to analyze complex driving scenes and generate high-quality pre-annotations for the human annotators, while for LVLM evaluation, we show that using the text-only large language models (LLMs) as judges reveals even better alignment with human preferences than the LVLM judges. Moreover, with our CODA-LM, we build CODA-VLM, a new driving LVLM surpassing all open-sourced counterparts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsFocus
