CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation

Chenchen Kuai; Chenhao Wu; Yang Zhou; Xiubin Bruce Wang; Tianbao Yang; Zhengzhong Tu; Zihao Li; Yunlong Zhang

arXiv:2508.15846·cs.CL·November 17, 2025

CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation

Chenchen Kuai, Chenhao Wu, Yang Zhou, Xiubin Bruce Wang, Tianbao Yang, Zhengzhong Tu, Zihao Li, Yunlong Zhang

PDF

1 Video

TL;DR

This paper introduces CyPortQA, a comprehensive benchmark for evaluating multimodal large language models' ability to assist U.S. port operations in cyclone preparedness by integrating diverse data sources and assessing reasoning capabilities.

Contribution

We present CyPortQA, the first specialized benchmark for multimodal models in port cyclone scenarios, with a large dataset of real-world cases and automated question-answer pairs for evaluation.

Findings

01

MLLMs show strong potential in understanding cyclone scenarios

02

Models face challenges in impact estimation and decision reasoning

03

Benchmark enables systematic evaluation of multimodal model performance

Abstract

As tropical cyclones intensify and track forecasts become increasingly uncertain, U.S. ports face heightened supply-chain risk under extreme weather conditions. Port operators need to rapidly synthesize diverse multimodal forecast products, such as probabilistic wind maps, track cones, and official advisories, into clear, actionable guidance as cyclones approach. Multimodal large language models (MLLMs) offer a powerful means to integrate these heterogeneous data sources alongside broader contextual knowledge, yet their accuracy and reliability in the specific context of port cyclone preparedness have not been rigorously evaluated. To fill this gap, we introduce CyPortQA, the first multimodal benchmark tailored to port operations under cyclone threat. CyPortQA assembles 2,917 realworld disruption scenarios from 2015 through 2023, spanning 145 U.S. principal ports and 90 named storms.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CyPortQA: Benchmarking Multimodal Large Language Models for Cyclone Preparedness in Port Operation· underline