TL;DR
This paper presents a standardized evaluation framework for language-brain encoding experiments, enabling consistent comparison of language models' ability to predict brain responses across multiple datasets.
Contribution
It introduces a unified evaluation setup, tests sensitivity to data randomness, and analyzes voxel selection effects, promoting transparency and reproducibility.
Findings
Evaluation measures are sensitive to randomized data.
Voxel selection methods significantly impact results.
The framework is publicly available for future research.
Abstract
Language-brain encoding experiments evaluate the ability of language models to predict brain responses elicited by language stimuli. The evaluation scenarios for this task have not yet been standardized which makes it difficult to compare and interpret results. We perform a series of evaluation experiments with a consistent encoding setup and compute the results for multiple fMRI datasets. In addition, we test the sensitivity of the evaluation measures to randomized data and analyze the effect of voxel selection methods. Our experimental framework is publicly available to make modelling decisions more transparent and support reproducibility for future comparisons.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
