MuViT: Multi-Resolution Vision Transformers for Learning Across Scales in Microscopy

Albert Dominguez Mantes; Gioele La Manno; Martin Weigert

arXiv:2602.24222·cs.CV·March 2, 2026

MuViT: Multi-Resolution Vision Transformers for Learning Across Scales in Microscopy

Albert Dominguez Mantes, Gioele La Manno, Martin Weigert

PDF

Open Access

TL;DR

MuViT is a novel transformer architecture that effectively fuses multi-resolution microscopy images by embedding patches into a shared coordinate system, improving analysis across scales.

Contribution

Introduces MuViT, a multi-resolution vision transformer that integrates multi-scale microscopy data using world-coordinate embeddings and extends rotary positional embeddings.

Findings

01

MuViT outperforms ViT and CNN baselines on various microscopy benchmarks.

02

Multi-resolution MAE pretraining yields scale-consistent representations.

03

Explicit world-coordinate modeling enhances multi-scale microscopy analysis.

Abstract

Modern microscopy routinely produces gigapixel images that contain structures across multiple spatial scales, from fine cellular morphology to broader tissue organization. Many analysis tasks require combining these scales, yet most vision models operate at a single resolution or derive multi-scale features from one view, limiting their ability to exploit the inherently multi-resolution nature of microscopy data. We introduce MuViT, a transformer architecture built to fuse true multi-resolution observations from the same underlying image. MuViT embeds all patches into a shared world-coordinate system and extends rotary positional embeddings to these coordinates, enabling attention to integrate wide-field context with high-resolution detail within a single encoder. Across synthetic benchmarks, kidney histopathology, and high-resolution mouse-brain microscopy, MuViT delivers consistent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · AI in cancer detection · Digital Holography and Microscopy