Unified Medical Image Tokenizer for Autoregressive Synthesis and Understanding

Chenglong Ma; Yuanfeng Ji; Jin Ye; Zilong Li; Chenhui Wang; Junzhi Ning; Wei Li; Lihao Liu; Qiushan Guo; Tianbin Li; Junjun He; Hongming Shan

arXiv:2505.19225·eess.IV·April 2, 2026

Unified Medical Image Tokenizer for Autoregressive Synthesis and Understanding

Chenglong Ma, Yuanfeng Ji, Jin Ye, Zilong Li, Chenhui Wang, Junzhi Ning, Wei Li, Lihao Liu, Qiushan Guo, Tianbin Li, Junjun He, Hongming Shan

PDF

1 Repo 1 Models

TL;DR

This paper introduces MedITok, a unified medical image tokenizer trained on extensive multimodal data, enabling advanced autoregressive synthesis and understanding across diverse medical imaging modalities.

Contribution

It presents a novel two-stage training framework that leverages unpaired images and paired image-text data to build a versatile medical image tokenizer.

Findings

01

Achieves state-of-the-art results on over 30 benchmarks across 9 modalities.

02

Utilizes over 33 million images and 2 million image-text pairs for training.

03

Enables autoregressive modeling for diagnostic and generative medical applications.

Abstract

Autoregressive modeling has driven major advances in multimodal AI, yet its application to medical imaging remains constrained by the absence of a unified image tokenizer that simultaneously preserves fine-grained anatomical structures and rich clinical semantics across heterogeneous modalities. Existing approaches jointly optimize image reconstruction and textual semantic objectives, relying on large-scale image-caption pairs and are prone to gradient interference. This is ill-suited for the medical domain where paired data are scarce and abundant unpaired images remain unexploited. This work identifies these issues in building unified medical image tokenizers, and introduces a principled two-stage training framework using visual representation as a bridge to address them. The propose visual representation alignment stage enables the utilization of large-scale unpaired medical images…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Masaaki-75/meditok
github

Models

🤗
massaki75/meditok
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.