Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Weiming Chen; Xitong Ling; Zhenyang Cai; Xidong Wang; Jiawen Li; Tian Guan; Benyou Wang; Yonghong He

arXiv:2605.08276·cs.CV·May 12, 2026

Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction

Weiming Chen, Xitong Ling, Zhenyang Cai, Xidong Wang, Jiawen Li, Tian Guan, Benyou Wang, Yonghong He

PDF

1 Models

TL;DR

This paper introduces ConvNeXt Masked-Diffusion, a convolutional foundation model for cell-level dense prediction in pathology, outperforming ViT-based models especially with limited annotations.

Contribution

It proposes a fully convolutional, self-supervised pretraining framework using masked diffusion, demonstrating superior performance and robustness over ViT-based models in pathology tasks.

Findings

01

CMD outperforms existing ViT-based models in dense prediction tasks.

02

CMD surpasses state-of-the-art segmentation methods with fewer task-specific parameters.

03

CMD shows stronger robustness and generalization under limited annotations.

Abstract

Cell-level dense prediction is central to computational pathology, but remains challenging due to fine-grained histological structures, strong domain shifts, and costly dense annotations. Existing ViT-based pathology foundation models rely on patch tokenization, which can disrupt spatial continuity and weaken local morphological details needed for cell-level prediction. To address this, we propose Masked-Diffusion Convolutional Foundation Models, termed ConvNeXt Masked-Diffusion (CMD), a self-supervised convolutional generative pretraining framework for dense pathology representation learning. CMD uses a fully convolutional ConvNeXt-UNet backbone, performs masked-diffusion pretraining in pixel space, and incorporates frozen pathology foundation model features through adaptive normalization. Experimental results demonstrate that CMD consistently outperforms existing ViT-based pathology…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
M4A1TasteGood/ConvNeXt_Masked_Diffusion
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.