Weather-R1: Logically Consistent Reinforcement Fine-Tuning for Multimodal Reasoning in Meteorology
Kaiyu Wu, Pucheng Han, Hualong Zhang, Naigeng Wu, Keze Wang

TL;DR
Weather-R1 introduces a logically consistent multimodal reasoning model for meteorology, utilizing a new benchmark and a novel reinforcement fine-tuning method to improve reasoning faithfulness and performance.
Contribution
The paper presents Weather-R1, the first reasoning VLM with logical faithfulness in meteorology, and proposes LoCo-RFT, a reinforcement fine-tuning approach with a logical consistency reward.
Findings
Weather-R1 improves WeatherQA performance by 9.8 percentage points.
LoCo-RFT effectively resolves reasoning contradictions in the model.
Weather-R1 outperforms baseline models including Supervised Fine-Tuning and RFT.
Abstract
While Vision Language Models (VLMs) show advancing reasoning capabilities, their application in meteorology is constrained by a domain gap and a reasoning faithfulness gap. Specifically, mainstream Reinforcement Fine-Tuning (RFT) can induce Self-Contradictory Reasoning (Self-Contra), where the model's reasoning contradicts its final answer, which is unacceptable in such a high-stakes domain. To address these challenges, we construct WeatherQA, a novel multimodal reasoning benchmark in meteorology. We also propose Logically Consistent Reinforcement Fine-Tuning (LoCo-RFT), which resolves Self-Contra by introducing a logical consistency reward. Furthermore, we introduce Weather-R1, the first reasoning VLM with logical faithfulness in meteorology, to the best of our knowledge. Experiments demonstrate that Weather-R1 improves performance on WeatherQA by 9.8 percentage points over the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
