S2SServiceBench: A Multimodal Benchmark for Last-Mile S2S Climate Services
Chenyue Li, Wen Deng, Zhuotao Sun, Mengxi Jin, Hanzhe Cui, Han Li, Shentong Li, Man Kit Yu, Ming Long Lai, Yuhao Yang, Mengqian Lu, Binhang Yuan

TL;DR
This paper introduces S2SServiceBench, a comprehensive multimodal benchmark designed to evaluate the ability of large language models and agents to generate reliable, decision-oriented climate services from subseasonal-to-seasonal forecasts across multiple domains.
Contribution
It presents a new benchmark dataset for last-mile climate services, covering diverse applications and levels, and benchmarks current models to identify key challenges in understanding and reasoning under uncertainty.
Findings
Models struggle with understanding service plots and signals.
Operationalizing uncertainty remains a major challenge.
Stable, evidence-based decision analysis is difficult for current models.
Abstract
Subseasonal-to-seasonal (S2S) forecasts play an essential role in providing a decision-critical weeks-to-months planning window for climate resilience and sustainability, yet a growing bottleneck is the last-mile gap: translating scientific forecasts into trusted, actionable climate services, requiring reliable multimodal understanding and decision-facing reasoning under uncertainty. Meanwhile, multimodal large language models (MLLMs) and corresponding agentic paradigms have made rapid progress in supporting various workflows, but it remains unclear whether they can reliably generate decision-making deliverables from operational service products (e.g., actionable signal comprehension, decision-making handoff, and decision analysis & planning) under uncertainty. We introduce S2SServiceBench, a multimodal benchmark for last-mile S2S climate services curated from an operational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeteorological Phenomena and Simulations · Climate variability and models · Tropical and Extratropical Cyclones Research
