TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation

Han Gong; Zhen Zhou; Yunyang Shi; Yan Tan; Jinbiao Huo; Qi Hong; and Zhiyuan Liu

arXiv:2605.00907·cs.CV·May 5, 2026

TRIP-Evaluate: An Open Multimodal Benchmark for Evaluating Large Models in Transportation

Han Gong, Zhen Zhou, Yunyang Shi, Yan Tan, Jinbiao Huo, Qi Hong, and Zhiyuan Liu

PDF

TL;DR

TRIP-Evaluate is a comprehensive multimodal benchmark designed to evaluate large models' capabilities across transportation tasks involving text, images, and point-cloud data, addressing existing gaps in specialized assessment tools.

Contribution

It introduces a new open benchmark with 837 items covering diverse transportation functions, enabling detailed diagnosis and comparison of multimodal large models.

Findings

01

Text performance is improving across models.

02

Weaknesses remain in engineering calculations and scene understanding.

03

Benchmark supports fine-grained failure mode diagnosis.

Abstract

Large language models (LLMs) and multimodal large models (MLLMs) are increasingly used for transportation tasks such as regulation question answering, traffic management support, engineering review, and autonomous-driving scene reasoning. Yet transportation workflows are rule-intensive, computation-intensive, safety-critical, and inherently multimodal. Existing general benchmarks provide limited evidence of whether a model can apply regulations correctly, perform verifiable engineering calculations, or interpret traffic scenes reliably, while the small number of public transportation benchmarks remain narrow in scope and rarely support fine-grained diagnosis across text, images, and point-cloud data. To address this gap, we present TRIP-Evaluate, an open multimodal benchmark for large models in transportation. The benchmark organizes 837 items using a role-task-knowledge taxonomy that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.