JW-VL: A Vision-Language Model for Solar Physics
Mingfu Shao, Hui Wang, Liyue Tong, Yuyang Li, Cunshi Wang, Jiaben Lin, Suo Liu, Haiqing Xu, Yin Zhang, Jing Huang

TL;DR
JW-VL is a specialized vision-language model tailored for solar physics that integrates multi-wavelength data to enhance solar image analysis and reasoning tasks.
Contribution
The paper introduces JW-VL, a fine-tuned foundation model for solar physics that combines multimodal data and knowledge distillation for improved analysis.
Findings
JW-VL enables end-to-end solar data modeling.
It supports tasks like image recognition, question answering, and OCR.
A solar activity report agent demonstrates interdisciplinary application.
Abstract
Vision-Language Models (VLMs) have achieved breakthrough progress in general knowledge domains, yet adaptation to specialized scientific fields remains challenging due to multimodal representation shifts and the limited integration of domain-specific knowledge. To address the limitations of general-purpose VLMs when applied to solar physics image recognition, analysis, and reasoning, we propose JinWu Vision-Language (JW-VL), a fine-tuned foundation model tailored for solar physics. The model integrates multi-wavelength observational data from both space-based and ground-based telescopes, encompassing representative spectral bands spanning the photosphere, chromosphere, and corona. Built upon a cross-modal alignment knowledge distillation framework, JW-VL learns a joint visual-semantic embedding that enables end-to-end modeling from raw solar observational data to downstream tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
