TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy

Yaoyu Liu; Minghui Zhang; Xin You; Hanxiao Zhang; Yun Gu

arXiv:2603.09217·cs.CV·March 16, 2026

TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy

Yaoyu Liu, Minghui Zhang, Xin You, Hanxiao Zhang, Yun Gu

PDF

Open Access

TL;DR

TubeMLLM is a novel foundation model that enhances topology-aware perception of vessel-like anatomy in medical images by integrating topological priors and multimodal learning, achieving state-of-the-art results across diverse datasets.

Contribution

The paper introduces TubeMLLM, a unified multimodal foundation model with explicit topological priors and an adaptive loss, advancing topology consistency and zero-shot generalization in medical vessel analysis.

Findings

01

Achieves state-of-the-art out-of-distribution performance

02

Significantly reduces topological errors in color fundus images

03

Excels in zero-shot transfer to unseen X-ray angiography

Abstract

Modeling medical vessel-like anatomy is challenging due to its intricate topology and sensitivity to dataset shifts. Consequently, task-specific models often suffer from topological inconsistencies, including artificial disconnections and spurious merges. Motivated by the promise of multimodal large language models (MLLMs) for zero-shot generalization, we propose TubeMLLM, a unified foundation model that couples structured understanding with controllable generation for medical vessel-like anatomy. By integrating topological priors through explicit natural language prompting and aligning them with visual representations in a shared-attention architecture, TubeMLLM significantly enhances topology-aware perception. Furthermore, we construct TubeMData, a pionner multimodal benchmark comprising comprehensive topology-centric tasks, and introduce an adaptive loss weighting strategy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis · Multimodal Machine Learning Applications · Advanced Neural Network Applications