DINO-BOLDNet: A DINOv3-Guided Multi-Slice Attention Network for T1-to-BOLD Generation
Jianwei Wang, Qing Wang, Menglan Ruan, Rongjun Ge, Chunfeng Yang, Yang Chen, Chunming Xie

TL;DR
This paper introduces DINO-BOLDNet, a novel transformer-guided neural network that generates BOLD images from T1-weighted images, improving structural-to-functional mapping with self-supervised learning and multi-slice attention.
Contribution
The paper presents the first framework for directly generating mean BOLD images from T1w images using a DINOv3-guided multi-slice attention model with a perceptual loss.
Findings
Outperforms conditional GAN baseline in PSNR and MS-SSIM
Uses self-supervised DINOv3 for structural feature extraction
Achieves accurate BOLD image generation from T1w images
Abstract
Generating BOLD images from T1w images offers a promising solution for recovering missing BOLD information and enabling downstream tasks when BOLD images are corrupted or unavailable. Motivated by this, we propose DINO-BOLDNet, a DINOv3-guided multi-slice attention framework that integrates a frozen self-supervised DINOv3 encoder with a lightweight trainable decoder. The model uses DINOv3 to extract within-slice structural representations, and a separate slice-attention module to fuse contextual information across neighboring slices. A multi-scale generation decoder then restores fine-grained functional contrast, while a DINO-based perceptual loss encourages structural and textural consistency between predictions and ground-truth BOLD in the transformer feature space. Experiments on a clinical dataset of 248 subjects show that DINO-BOLDNet surpasses a conditional GAN baseline in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
