T-Gated Adapter: A Lightweight Temporal Adapter for Vision-Language Medical Segmentation

Pranjal Khadka

arXiv:2604.08167·cs.CV·April 10, 2026

T-Gated Adapter: A Lightweight Temporal Adapter for Vision-Language Medical Segmentation

Pranjal Khadka

PDF

TL;DR

This paper introduces T-Gated Adapter, a lightweight temporal transformer-based module that enhances vision-language models for 3D medical image segmentation by incorporating adjacent-slice context, improving accuracy and cross-domain robustness.

Contribution

It proposes a novel temporal adapter with a transformer, spatial refinement, and adaptive gating to leverage 3D context in vision-language models for medical segmentation.

Findings

01

Achieves a mean Dice of 0.704 on FLARE22, a +0.206 improvement over baseline.

02

Zero-shot results improve Dice by +0.210 and +0.230 on BTCV and AMOS22.

03

Cross-modality evaluation shows better generalization, with Dice of 0.366 on MRI, outperforming supervised 3D models.

Abstract

Medical image segmentation traditionally relies on fully supervised 3D architectures that demand a large amount of dense, voxel-level annotations from clinical experts which is a prohibitively expensive process. Vision Language Models (VLMs) offer a powerful alternative by leveraging broad visual semantic representations learned from billions of images. However, when applied independently to 2D slices of a 3D scan, these models often produce noisy and anatomically implausible segmentations that violate the inherent continuity of anatomical structures. We propose a temporal adapter that addresses this by injecting adjacent-slice context directly into the model's visual token representations. The adapter comprises a temporal transformer attending across a fixed context window at the token level, a spatial context block refining within-slice representations, and an adaptive gate balancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.