# Kymata Soto Language Dataset: an electro-magnetoencephalographic dataset for natural speech processing

**Authors:** ChenTianyi Yang, Oliver Parish, Anastasia Klimovich-Gray, Cai Wingfield, William D. Marslen-Wilson, Chao Zhang, Alexandra Woolgar, Andrew Thwaites

PMC · DOI: 10.1038/s41597-026-06579-8 · Scientific Data · 2026-01-20

## TL;DR

The Kymata Soto Language Dataset provides EEG and MEG data from Russian and English speakers listening to natural speech, enabling research into brain responses to language.

## Contribution

This dataset introduces a standardized, open-source resource for studying natural speech processing using EEG and MEG.

## Key findings

- Consistent low-level loudness perception trends were observed across Russian and English speakers.
- The dataset is organized using BIDS, supporting reproducible and transparent research.
- Multiple repetitions of speech stimuli allow for detailed analysis of brain responses.

## Abstract

The Kymata Soto Language Dataset comprises raw electroencephalographic (EEG) and magnetoencephalographic (MEG) recordings from 15 native Russian speakers and 20 native English speakers as they listened to approximately seven minutes of conversational speech in their respective native languages. Each participant heard the same conversational speech stimulus multiple times (four repetitions for Russian speakers and eight for English speakers). The dataset includes transcriptions of the recordings, along with timestamp annotations for each phoneme and word. Organized according to the Brain Imaging Data Structure (BIDS), this dataset facilitates in-depth research into brain responses to naturalistic speech. To validate the dataset and our preprocessing pipeline, we employed Python-based analyses, revealing consistent low-level loudness perception trends across both language groups. All EEG and MEG data, audio recordings, transcriptions with timestamp annotations, and validation codes are open source, promoting transparency and reproducibility.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12916794/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12916794/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12916794/full.md

---
Source: https://tomesphere.com/paper/PMC12916794