Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning

Jiajie Li; Chenhui Xu; Meihuan Liu; Jinjun Xiong

arXiv:2603.20116·cs.CV·March 23, 2026

Chain-of-Adaptation: Surgical Vision-Language Adaptation with Reinforcement Learning

Jiajie Li, Chenhui Xu, Meihuan Liu, Jinjun Xiong

PDF

Open Access

TL;DR

The paper introduces Chain-of-Adaptation, a reinforcement learning-based framework that enhances domain-specific adaptation of vision-language models in surgical contexts while preserving their general capabilities.

Contribution

It presents a novel structured reasoning approach for domain adaptation that maintains multimodal priors and improves generalization in vision-language models.

Findings

01

Achieves higher accuracy on surgical benchmarks

02

Demonstrates better out-of-distribution generalization

03

Maintains core visual-language abilities

Abstract

Conventional fine-tuning on domain-specific datasets can inadvertently alter a model's pretrained multimodal priors, leading to reduced generalization. To address this, we propose Chain-of-Adaptation (CoA), an adaptation framework designed to integrate domain knowledge while maintaining the model's inherent reasoning and perceptual capabilities. CoA introduces a structured reasoning format that enhances domain alignment without sacrificing general multimodal competence by reinforcement learning. Experiments on standard surgical benchmarks, under both in-distribution and out-of-distribution settings, demonstrate that CoA achieves higher accuracy, stronger generalization, and more stable behavior than supervised fine-tuning. Furthermore, ablation studies confirm that CoA effectively preserves the model's core visual-language abilities, providing a reliable pathway for domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis