Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning
Jiajie Li, Chenhui Xu, Meihuan Liu, Jinjun Xiong

TL;DR
The paper introduces Chain-of-Adaptation, a reinforcement learning-based framework that enhances domain-specific adaptation of vision-language models in surgical contexts while preserving their general capabilities.
Contribution
It presents a novel structured reasoning approach for domain adaptation that maintains multimodal priors and improves generalization in vision-language models.
Findings
Achieves higher accuracy on surgical benchmarks
Demonstrates better out-of-distribution generalization
Maintains core visual-language abilities
Abstract
Conventional fine-tuning on domain-specific datasets can inadvertently alter a model's pretrained multimodal priors, leading to reduced generalization. To address this, we propose Chain-of-Adaptation (CoA), an adaptation framework designed to integrate domain knowledge while maintaining the model's inherent reasoning and perceptual capabilities. CoA introduces a structured reasoning format that enhances domain alignment without sacrificing general multimodal competence by reinforcement learning. Experiments on standard surgical benchmarks, under both in-distribution and out-of-distribution settings, demonstrate that CoA achieves higher accuracy, stronger generalization, and more stable behavior than supervised fine-tuning. Furthermore, ablation studies confirm that CoA effectively preserves the model's core visual-language abilities, providing a reliable pathway for domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
