TL;DR
Solar-VLM introduces a multimodal, large-language-model-driven framework that fuses satellite imagery, weather data, and temporal observations for improved photovoltaic power forecasting.
Contribution
It develops a unified model with modality-specific encoders and a cross-site fusion mechanism, advancing the integration of heterogeneous data sources for PV forecasting.
Findings
Outperforms existing models on data from eight PV stations in China.
Effectively captures spatiotemporal dependencies across multiple data modalities.
Publicly available code at https://github.com/rhp413/Solar-VLM.
Abstract
Photovoltaic (PV) power forecasting plays a critical role in power system dispatch and market participation. Because PV generation is highly sensitive to weather conditions and cloud motion, accurate forecasting requires effective modeling of complex spatiotemporal dependencies across multiple information sources. Although recent studies have advanced AI-based forecasting methods, most fail to fuse temporal observations, satellite imagery, and textual weather information in a unified framework. This paper proposes Solar-VLM, a large-language-model-driven framework for multimodal PV power forecasting. First, modality-specific encoders are developed to extract complementary features from heterogeneous inputs. The time-series encoder adopts a patch-based design to capture temporal patterns from multivariate observations at each site. The visual encoder, built upon a Qwen-based vision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
