# Robust Evaluation of Language-Brain Encoding Experiments

**Authors:** Lisa Beinborn, Samira Abnar, Rochelle Choenni

arXiv: 1904.02547 · 2019-04-05

## TL;DR

This paper presents a standardized evaluation framework for language-brain encoding experiments, enabling consistent comparison of language models' ability to predict brain responses across multiple datasets.

## Contribution

It introduces a unified evaluation setup, tests sensitivity to data randomness, and analyzes voxel selection effects, promoting transparency and reproducibility.

## Key findings

- Evaluation measures are sensitive to randomized data.
- Voxel selection methods significantly impact results.
- The framework is publicly available for future research.

## Abstract

Language-brain encoding experiments evaluate the ability of language models to predict brain responses elicited by language stimuli. The evaluation scenarios for this task have not yet been standardized which makes it difficult to compare and interpret results. We perform a series of evaluation experiments with a consistent encoding setup and compute the results for multiple fMRI datasets. In addition, we test the sensitivity of the evaluation measures to randomized data and analyze the effect of voxel selection methods. Our experimental framework is publicly available to make modelling decisions more transparent and support reproducibility for future comparisons.

---
Source: https://tomesphere.com/paper/1904.02547