Learning Domain Knowledge in Multimodal Large Language Models through Reinforcement Fine-Tuning
Qinglong Cao, Yuntian Chen, Chao Ma, Xiaokang Yang

TL;DR
This paper demonstrates that integrating domain knowledge into multimodal large language models via reinforcement fine-tuning significantly improves their performance in specialized fields like remote sensing and medical imaging, surpassing traditional input-level methods.
Contribution
The authors introduce a reinforcement fine-tuning framework that incorporates domain knowledge as constraints and reward signals, addressing the limitations of textual conditioning in current models.
Findings
Reinforcement fine-tuning improves model performance in domain-specific tasks.
Input-level domain knowledge injection yields minimal benefits.
State-of-the-art results achieved on remote sensing and medical datasets.
Abstract
Multimodal large language models (MLLMs) have shown remarkable capabilities in multimodal perception and understanding tasks. However, their effectiveness in specialized domains, such as remote sensing and medical imaging, remains limited. A natural approach to domain adaptation is to inject domain knowledge through textual instructions, prompts, or auxiliary captions. Surprisingly, we find that such input-level domain knowledge injection yields little to no improvement on scientific multimodal tasks, even when the domain knowledge is explicitly provided. This observation suggests that current MLLMs fail to internalize domain-specific priors through language alone, and that domain knowledge must be integrated at the optimization level. Motivated by this insight, we propose a reinforcement fine-tuning framework that incorporates domain knowledge directly into the learning objective.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
