SIGNAL: Dataset for Semantic and Inferred Grammar Neurological Analysis of Language
Anna Komissarenko, Ekaterina Voloshina, Anastasia Cheveleva, Ilia Semenkov, Oleg Serikov, Alex Ossadtchi

TL;DR
This paper introduces a dataset combining EEG recordings and sentences to study brain and language model alignment.
Contribution
The dataset includes both congruent and incongruent sentences with EEG data, enabling brain-model alignment research.
Findings
The dataset contains 600 sentences with EEG recordings from 21 participants.
Validation confirmed the dataset's suitability for brain-model alignment studies.
Stimuli were assessed by native speakers and used in LLM probing.
Abstract
Recently, the idea of brain-model alignment has been the topic of several influential works. However, most of previous studies were based on datasets collected during regular reading tasks where the subjects were not exposed to processing linguistic incongruencies, and stimuli were not controlled for key linguistic properties. Meanwhile, interpretability studies of Large Language Models pay growing attention to thoroughly designed linguistic tasks based on certain acceptability measures. We present a dataset that contains 600 sentences with a combination of congruent and grammatically or/and semantically incongruent sentences coupled with high density 64-channel EEG recordings of 21 participants. The text stimuli were assessed by native speakers and later used in EEG recording and validation and LLM probing. The validation results proved suitability of the data for future research on…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurobiology of Language and Bilingualism · EEG and Brain-Computer Interfaces · Topic Modeling
