TL;DR
This paper introduces TABLeT, a novel method that uses a pre-trained 2D autoencoder to tokenize fMRI volumes, enabling efficient long-range spatiotemporal modeling with Transformers and demonstrating superior performance on large-scale brain imaging datasets.
Contribution
The paper presents a new approach that compresses 3D fMRI data into tokens using a 2D autoencoder, allowing scalable long-range modeling with Transformers and improved efficiency.
Findings
Outperforms existing models on UKB, HCP, and ADHD-200 datasets.
Achieves better computational and memory efficiency than voxel-based methods.
Self-supervised masked token pre-training enhances downstream task performance.
Abstract
Modeling long-range spatiotemporal dynamics in functional Magnetic Resonance Imaging (fMRI) remains a key challenge due to the high dimensionality of the four-dimensional signals. Prior voxel-based models, although demonstrating excellent performance and interpretation capabilities, are constrained by prohibitive memory demands and thus can only capture limited temporal windows. To address this, we propose TABLeT (Two-dimensionally Autoencoded Brain Latent Transformer), a novel approach that tokenizes fMRI volumes using a pre-trained 2D natural image autoencoder. Each 3D fMRI volume is compressed into a compact set of continuous tokens, enabling long-sequence modeling with a simple Transformer encoder with limited VRAM. Across large-scale benchmarks including the UK-Biobank (UKB), Human Connectome Project (HCP), and ADHD-200 datasets, TABLeT outperforms existing models in multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
