Linguistic properties and model scale in brain encoding: from small to compressed language models
Subba Reddy Oota, Vijay Rowtula, Satya Sai Srinath Namburi, Khushbu Pahwa, Anant Khandelwal, Manish Gupta, Tanmoy Chakraborty, Bapi S. Raju

TL;DR
Scaling language models enhances their alignment with human brain activity, but smaller and compressed models can achieve similar brain predictivity, indicating that large size is not strictly necessary for brain-relevant representations.
Contribution
This study systematically compares the effects of model scale and compression on brain alignment, revealing that modest models and compressed variants can match larger models in neural predictivity.
Findings
3B models match larger models in brain predictivity
Compression methods largely preserve neural alignment
Brain predictivity saturates at modest model sizes
Abstract
Recent work has shown that scaling large language models (LLMs) improves their alignment with human brain activity, yet it remains unclear what drives these gains and which representational properties are responsible. Although larger models often yield better task performance and brain alignment, they are increasingly difficult to analyze mechanistically. This raises a fundamental question: what is the minimal model capacity required to capture brain-relevant representations? To address this question, we systematically investigate how constraining model scale and numerical precision affects brain alignment. We compare full-precision LLMs, small language models (SLMs), and compressed variants (quantized and pruned) by predicting fMRI responses during naturalistic language comprehension. Across model families up to 14B parameters, we find that 3B SLMs achieve brain predictivity…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The study addresses an emerging question about model efficiency and neural alignment, offering a systematic comparison between model size, quantization, and brain predictivity. The experimental framework—combining fMRI encoding and linguistic probing—is technically sound and clearly presented. 2. The paper is well written and carefully structured, with clear visualizations and comprehensive references to recent brain–language modeling literature. 3. The attempt to connect linguistic task perf
1. The overall motivation and significance are limited. While scaling and efficiency are relevant engineering questions, it is unclear why quantized models are meaningful for cognitive neuroscience. Brain–language alignment research is primarily driven by scientific, not deployment, objectives, so the practical incentive for compressing models in this context is weak. As presented, the work is more an engineering benchmark than a scientific contribution. 2. The dataset and experimental scope are
Timely and relevant topic. The paper addresses an important and underexplored question: how scaling and compression strategies affect the brain-alignment properties of language models. This is of both scientific and practical interest for NeuroAI research. Comprehensive evaluation design. By combining voxel-wise encoding models with a linguistic benchmark (FlashHolmes), the study bridges neural and computational levels of analysis and identifies which linguistic properties support brain alignme
**Lack of statistical validation.** Figure 2 compares alignment across quantization methods for the same model (e.g., Qwen), but no statistical tests, confidence intervals, or measures of variability across runs or participants are reported. Consequently, it is unclear whether observed differences, such as the apparent improvement after AWQ compression, are meaningful or fall within noise. Without proper significance testing, it is difficult to interpret whether certain compression methods relia
Originality: First to jointly study model scale, compression, and brain alignment in a controlled neuroimaging setting. Quality: Strong empirical foundation with multiple model families, quantization methods, and brain regions. Clarity: Clear distinction between brain alignment and task performance, with nuanced interpretation of their divergence. Significance: Offers practical recommendations for neuroAI applications, especially in low-latency or low-resource environments like BCIs.
1.Model scale upper limit: The largest model evaluated is 8B; extrapolating findings to 13B+ models (DeepSeek-R1-Distill-Qwen-14B, Qwen3-32B, DeepSeek-R1-Distill-Qwen-32B, etc.) remains unclear. 2.Compression scope: The motivation and experimental design of this article are not convincing. Only post-training quantization is studied; pruning, distillation, or structured compression are not explored. 3.Modality limitation: Only fMRI is used; MEG or ECoG could reveal temporal dynamics of alignme
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · Epilepsy research and treatment · Ferroelectric and Negative Capacitance Devices
