# Subword Representations Successfully Decode Brain Responses to Morphologically Complex Written Words

**Authors:** Tero Hakala, Tiina Lindh-Knuutila, Annika Hultén, Minna Lehtonen, Riitta Salmelin

PMC · DOI: 10.1162/nol_a_00149 · 2024-09-11

## TL;DR

This study shows that brain responses to complex Finnish words can be decoded using subword segments, suggesting the brain processes both whole words and their morphological components.

## Contribution

The study introduces morphologically aware subword representations for decoding brain responses to morphologically complex words in agglutinative languages.

## Key findings

- Decoding accuracy exceeded significance thresholds for all segmentations at 350–500 ms after stimulus onset.
- Only morphologically aware segmentations reached significance in the brain decoding task.
- Neural decoding using subword segments is effective for multimorphemic words in languages with complex morphology.

## Abstract

This study extends the idea of decoding word-evoked brain activations using a corpus-semantic vector space to multimorphemic words in the agglutinative Finnish language. The corpus-semantic models are trained on word segments, and decoding is carried out with word vectors that are composed of these segments. We tested several alternative vector-space models using different segmentations: no segmentation (whole word), linguistic morphemes, statistical morphemes, random segmentation, and character-level 1-, 2- and 3-grams, and paired them with recorded MEG responses to multimorphemic words in a visual word recognition task. For all variants, the decoding accuracy exceeded the standard word-label permutation-based significance thresholds at 350–500 ms after stimulus onset. However, the critical segment-label permutation test revealed that only those segmentations that were morphologically aware reached significance in the brain decoding task. The results suggest that both whole-word forms and morphemes are represented in the brain and show that neural decoding using corpus-semantic word representations derived from compositional subword segments is applicable also for multimorphemic word forms. This is especially relevant for languages with complex morphology, because a large proportion of word forms are rare and it can be difficult to find statistically reliable surface representations for them in any large corpus.

## Full-text entities

- **Diseases:** TECHNICAL TERMS (MESH:D000088562), blinks (MESH:D000092164), neurological problems (MESH:D009461)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11410357/full.md

---
Source: https://tomesphere.com/paper/PMC11410357