Can Natural Image Autoencoders Compactly Tokenize fMRI Volumes for Long-Range Dynamics Modeling?

Peter Yongho Kim; Juhyeon Park; Jungwoo Park; Jubin Choi; Jungwoo Seo; Jiook Cha; Taesup Moon

arXiv:2604.03619·cs.CV·April 7, 2026

Can Natural Image Autoencoders Compactly Tokenize fMRI Volumes for Long-Range Dynamics Modeling?

Peter Yongho Kim, Juhyeon Park, Jungwoo Park, Jubin Choi, Jungwoo Seo, Jiook Cha, Taesup Moon

PDF

1 Repo 1 Models

TL;DR

This paper introduces TABLeT, a novel method that uses a pre-trained 2D autoencoder to tokenize fMRI volumes, enabling efficient long-range spatiotemporal modeling with Transformers and demonstrating superior performance on large-scale brain imaging datasets.

Contribution

The paper presents a new approach that compresses 3D fMRI data into tokens using a 2D autoencoder, allowing scalable long-range modeling with Transformers and improved efficiency.

Findings

01

Outperforms existing models on UKB, HCP, and ADHD-200 datasets.

02

Achieves better computational and memory efficiency than voxel-based methods.

03

Self-supervised masked token pre-training enhances downstream task performance.

Abstract

Modeling long-range spatiotemporal dynamics in functional Magnetic Resonance Imaging (fMRI) remains a key challenge due to the high dimensionality of the four-dimensional signals. Prior voxel-based models, although demonstrating excellent performance and interpretation capabilities, are constrained by prohibitive memory demands and thus can only capture limited temporal windows. To address this, we propose TABLeT (Two-dimensionally Autoencoded Brain Latent Transformer), a novel approach that tokenizes fMRI volumes using a pre-trained 2D natural image autoencoder. Each 3D fMRI volume is compressed into a compact set of continuous tokens, enabling long-sequence modeling with a simple Transformer encoder with limited VRAM. Across large-scale benchmarks including the UK-Biobank (UKB), Human Connectome Project (HCP), and ADHD-200 datasets, TABLeT outperforms existing models in multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

beotborry/TABLeT
github

Models

🤗
beotborry/TABLeT_pretrained
model· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.