TL;DR
This paper introduces CyPortQA, a comprehensive benchmark for evaluating multimodal large language models' ability to assist U.S. port operations in cyclone preparedness by integrating diverse data sources and assessing reasoning capabilities.
Contribution
We present CyPortQA, the first specialized benchmark for multimodal models in port cyclone scenarios, with a large dataset of real-world cases and automated question-answer pairs for evaluation.
Findings
MLLMs show strong potential in understanding cyclone scenarios
Models face challenges in impact estimation and decision reasoning
Benchmark enables systematic evaluation of multimodal model performance
Abstract
As tropical cyclones intensify and track forecasts become increasingly uncertain, U.S. ports face heightened supply-chain risk under extreme weather conditions. Port operators need to rapidly synthesize diverse multimodal forecast products, such as probabilistic wind maps, track cones, and official advisories, into clear, actionable guidance as cyclones approach. Multimodal large language models (MLLMs) offer a powerful means to integrate these heterogeneous data sources alongside broader contextual knowledge, yet their accuracy and reliability in the specific context of port cyclone preparedness have not been rigorously evaluated. To fill this gap, we introduce CyPortQA, the first multimodal benchmark tailored to port operations under cyclone threat. CyPortQA assembles 2,917 realworld disruption scenarios from 2015 through 2023, spanning 145 U.S. principal ports and 90 named storms.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
