On Domain-Adaptive Post-Training for Multimodal Large Language Models
Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Wayne Xin Zhao, Zhongzhi Luan, Bo Dai, Zhenliang Zhang

TL;DR
This paper presents a systematic approach for domain adaptation of multimodal large language models through post-training, emphasizing data synthesis, training strategies, and extensive domain-specific evaluations.
Contribution
It introduces a generate-then-filter data synthesis pipeline, demonstrates the effectiveness of single-stage training for domain adaptation, and provides comprehensive evaluations across multiple high-impact domains.
Findings
Generated domain-specific data outperforms manual and closed-source methods.
Single-stage training surpasses two-stage approaches for domain adaptation.
Extensive experiments validate improved performance in biomedicine, food, and remote sensing.
Abstract
Adapting general multimodal large language models (MLLMs) to specific domains, such as scientific and industrial fields, is highly significant in promoting their practical applications. This paper systematically investigates domain adaptation of MLLMs via post-training, focusing on data synthesis, training pipeline, and task evaluation. (1) Data Synthesis: Using only open-source models, we develop a generate-then-filter pipeline that curates diverse visual instruction tasks based on domain-specific image-caption pairs. The resulting data surpass the data synthesized by manual rules or strong closed-source models in enhancing domain-specific performance. (2) Training Pipeline: Unlike general MLLMs that typically adopt a two-stage training paradigm, we find that a single-stage approach is more effective for domain adaptation. (3) Task Evaluation: We conduct extensive experiments in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗AdaptLLM/medicine-LLMmodel· 153 dl· ♡ 48153 dl♡ 48
- 🤗AdaptLLM/law-LLMmodel· 265 dl· ♡ 84265 dl♡ 84
- 🤗AdaptLLM/finance-LLMmodel· 200 dl· ♡ 155200 dl♡ 155
- 🤗AdaptLLM/finance-chatmodel· 1.5k dl· ♡ 1001.5k dl♡ 100
- 🤗AdaptLLM/medicine-chatmodel· 913 dl· ♡ 54913 dl♡ 54
- 🤗AdaptLLM/law-chatmodel· 991 dl· ♡ 46991 dl♡ 46
- 🤗AdaptLLM/finance-LLM-13Bmodel· 44 dl· ♡ 4944 dl♡ 49
- 🤗AdaptLLM/medicine-LLM-13Bmodel· 46 dl· ♡ 2446 dl♡ 24
- 🤗AdaptLLM/law-LLM-13Bmodel· 376 dl· ♡ 41376 dl♡ 41
- 🤗AdaptLLM/Adapt-MLLM-to-Domainsmodel· ♡ 13♡ 13
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsByte Pair Encoding · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Residual Connection · Adam · Attention Is All You Need · Softmax · Label Smoothing · Dropout · Linear Layer
