TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy
Yaoyu Liu, Minghui Zhang, Xin You, Hanxiao Zhang, Yun Gu

TL;DR
TubeMLLM is a novel foundation model that enhances topology-aware perception of vessel-like anatomy in medical images by integrating topological priors and multimodal learning, achieving state-of-the-art results across diverse datasets.
Contribution
The paper introduces TubeMLLM, a unified multimodal foundation model with explicit topological priors and an adaptive loss, advancing topology consistency and zero-shot generalization in medical vessel analysis.
Findings
Achieves state-of-the-art out-of-distribution performance
Significantly reduces topological errors in color fundus images
Excels in zero-shot transfer to unseen X-ray angiography
Abstract
Modeling medical vessel-like anatomy is challenging due to its intricate topology and sensitivity to dataset shifts. Consequently, task-specific models often suffer from topological inconsistencies, including artificial disconnections and spurious merges. Motivated by the promise of multimodal large language models (MLLMs) for zero-shot generalization, we propose TubeMLLM, a unified foundation model that couples structured understanding with controllable generation for medical vessel-like anatomy. By integrating topological priors through explicit natural language prompting and aligning them with visual representations in a shared-attention architecture, TubeMLLM significantly enhances topology-aware perception. Furthermore, we construct TubeMData, a pionner multimodal benchmark comprising comprehensive topology-centric tasks, and introduce an adaptive loss weighting strategy to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Multimodal Machine Learning Applications · Advanced Neural Network Applications
