Is Visual in-Context Learning for Compositional Medical Tasks within Reach?

Simon Rei{\ss}; Zdravko Marinov; Alexander Jaus; Constantin Seibold; M. Saquib Sarfraz; Erik Rodner; Rainer Stiefelhagen

arXiv:2507.00868·cs.CV·July 3, 2025

Is Visual in-Context Learning for Compositional Medical Tasks within Reach?

Simon Rei{\ss}, Zdravko Marinov, Alexander Jaus, Constantin Seibold, M. Saquib Sarfraz, Erik Rodner, Rainer Stiefelhagen

PDF

Open Access

TL;DR

This paper investigates the feasibility of visual in-context learning for complex, multi-step medical tasks, introducing a synthetic task generation method and analyzing training strategies to improve model adaptability.

Contribution

It presents a novel synthetic task generation engine and insights into training in-context learners for compositional visual tasks, especially in medical applications.

Findings

01

Visual in-context learning can handle multi-step tasks with proper training.

02

Synthetic task generation improves model adaptability to complex tasks.

03

Masking strategies influence training effectiveness for compositional tasks.

Abstract

In this paper, we explore the potential of visual in-context learning to enable a single model to handle multiple tasks and adapt to new tasks during test time without re-training. Unlike previous approaches, our focus is on training in-context learners to adapt to sequences of tasks, rather than individual tasks. Our goal is to solve complex tasks that involve multiple intermediate steps using a single model, allowing users to define entire vision pipelines flexibly at test time. To achieve this, we first examine the properties and limitations of visual in-context learning architectures, with a particular focus on the role of codebooks. We then introduce a novel method for training in-context learners using a synthetic compositional task generation engine. This engine bootstraps task sequences from arbitrary segmentation datasets, enabling the training of visual in-context learners for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications