On the Shape of Brainscores for Large Language Models (LLMs)

Jingkai Li

arXiv:2405.06725·q-bio.NC·May 16, 2024

On the Shape of Brainscores for Large Language Models (LLMs)

Jingkai Li

PDF

Open Access 1 Datasets

TL;DR

This study investigates the 'Brainscore' metric for evaluating LLMs' similarity to human brain activity, using topological features from fMRI data and statistical models to interpret and enhance understanding of the score.

Contribution

It introduces a novel analysis of the Brainscore metric by constructing topological features from fMRI data and identifying feature combinations that interpret brain-region-specific scores.

Findings

01

Identified feature combinations that interpret Brainscore across brain regions.

02

Demonstrated the validity of certain topological features for understanding LLM-brain similarity.

03

First interdisciplinary study to analyze the Brainscore metric in depth.

Abstract

With the rise of Large Language Models (LLMs), the novel metric "Brainscore" emerged as a means to evaluate the functional similarity between LLMs and human brain/neural systems. Our efforts were dedicated to mining the meaning of the novel score by constructing topological features derived from both human fMRI data involving 190 subjects, and 39 LLMs plus their untrained counterparts. Subsequently, we trained 36 Linear Regression Models and conducted thorough statistical analyses to discern reliable and valid features from our constructed ones. Our findings reveal distinctive feature combinations conducive to interpreting existing brainscores across various brain regions of interest (ROIs) and hemispheres, thereby significantly contributing to advancing interpretable machine learning (iML) studies. The study is enriched by our further discussions and analyses concerning existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsLinear Regression