TL;DR
VoxCor introduces a training-free method to generate reusable volumetric features from frozen 2D ViT models for improved multimodal 3D medical image correspondence, avoiding fine-tuning.
Contribution
It proposes a novel fit-transform approach combining triplanar ViT inference with WPLS projection for cross-modal volumetric feature extraction without training.
Findings
Enhances cross-modality and cross-subject correspondence transfer.
Reduces encoder sensitivity in dense correspondence tasks.
Achieves registration performance comparable to handcrafted and learned features.
Abstract
Cross-modal 3D medical image analysis requires voxelwise representations that remain anatomically consistent across imaging contrasts, scanners, and acquisition protocols. Recent work has shown that frozen 2D Vision Transformer (ViT) foundation models can support such representations, but typical pipelines extract features along a single anatomical axis and adapt those features inside a registration solver for one image pair at a time, leaving complementary viewing directions unused and producing representations that do not transfer to new volumes. We introduce VoxCor, a training-free fit--transform method for reusable volumetric feature representations from frozen 2D ViT foundation models. During an offline fitting phase, VoxCor combines triplanar ViT inference with a compact closed-form weighted partial least squares (WPLS) projection that uses fitting-time voxel correspondences to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
