SynopticBench: Evaluating Vision-Language Models on Generating Weather Forecast Discussions of the Future
Timothy B. Higgins, Antonios Mamalakis, and Chirag Agarwal

TL;DR
This paper introduces SynopticBench, a large dataset and evaluation framework for assessing vision-language models' ability to generate weather forecast discussions from meteorological data.
Contribution
It provides a new dataset of weather-related images and texts, along with a novel evaluation method for synoptic weather phenomena description quality.
Findings
State-of-the-art VLMs show sensitivity to evaluation metrics in weather text generation.
Extensive experiments reveal challenges in current models for weather forecast discussion generation.
Abstract
Recent advances in visual-language models (VLMs) have led to significant improvements in a plethora of complex multimodal tasks like image captioning, report generation, and visual perception. However, generating text from meteorological data is highly challenging because the atmosphere is a chaotic system that is rapidly changing at various spatial and temporal scales. Given the complexity of atmospheric phenomena, it is critical to verifiably quantify the effectiveness of existing VLMs on weather forecasting data. In this work, we present SynopticBench, a high-quality dataset consisting of 1,367,041 text samples of Area Forecast Discussions created by the National Weather Service over the continental United States paired to images of 500mb geopotential height, 2 meter temperature, and 850mb wind velocity in weather forecasts. We also present Synoptic Phenomena Alignment and Coverage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
