Scaling laws for language encoding models in fMRI
Richard Antonello, Aditya Vaidya, and Alexander G. Huth

TL;DR
This study demonstrates that larger transformer-based language models and more extensive fMRI data improve brain response prediction, with performance scaling logarithmically and nearing theoretical limits in certain brain areas.
Contribution
It provides the first comprehensive analysis of how scaling language models and data size enhances brain response prediction accuracy in fMRI studies.
Findings
Brain prediction performance scales logarithmically with model size.
Scaling model and data size improves encoding performance significantly.
Performance nears the theoretical maximum in some brain regions.
Abstract
Representations from transformer-based unidirectional language models are known to be effective at predicting brain responses to natural language. However, most studies comparing language models to brains have used GPT-2 or similarly sized language models. Here we tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction performance scales logarithmically with model size from 125M to 30B parameter models, with ~15% increased encoding performance as measured by correlation with a held-out test set across 3 subjects. Similar logarithmic behavior was observed when scaling the size of the fMRI training set. We also characterized scaling for acoustic encoding models that use HuBERT, WavLM, and Whisper, and we found comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Functional Brain Connectivity Studies · Speech Recognition and Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · Cosine Annealing · Residual Connection · Dense Connections · Dropout · Byte Pair Encoding
