# LeCoder: A large-scale automated coder for coding errors in word-production tasks

**Authors:** Shanhua Hu, Delaney DuVal, Brielle C. Stark, Nazbanou Nozari

PMC · DOI: 10.3758/s13428-026-02948-8 · Behavior Research Methods · 2026-02-17

## TL;DR

LeCoder is an automated tool that accurately codes speech errors in word-production tasks, improving objectivity and scalability in language research.

## Contribution

LeCoder is the first open-source, data-driven automated coder for English speech errors, offering high accuracy and generalizability.

## Key findings

- LeCoder achieves high accuracy compared to expert human coders in categorizing speech errors.
- LeCoder generalizes well to new participants and items not seen during training.
- LeCoder's use is encouraged to improve replicability in speech error analysis across research fields.

## Abstract

Speech errors have been instrumental in advancing our understanding of the architecture of the language production system, the nature of its representations, and its disorders. To be most informative, researchers usually need large amounts of data. Hand-coding such data can be both cumbersome and subjective. This paper presents LeCoder, the first open-source, automated error coder for English word and naming data, which uses a data-driven approach grounded in large-scale corpora to quantify the target–response relationship, allowing it to be flexible, scalable, and generalizable across new datasets. By testing the coder on two datasets from two aphasia labs that have been carefully coded by trained research assistants, we first establish that LeCoder has high accuracy when compared to expert coders, and in certain cases, offers a more logical categorization than human coders. We then show, using robust machine-learning approaches, that LeCoder’s performance generalizes to new participants and items it has never encountered before. Collectively, these findings encourage the use of LeCoder across labs for more objective coding of speech errors, which will, in turn, increase replicability of findings in all subfields of research that use speech error analysis, including neuropsychological research.

## Full-text entities

- **Diseases:** PNT (MESH:D013736), language disorders (MESH:D007806), Aphasia (MESH:D001037), Broca's, and conduction aphasia (MESH:D001039), Speech errors (MESH:D013064)
- **Chemicals:** water (MESH:D014867), asp (MESH:D001224), IPA (-)
- **Species:** Canis lupus familiaris (dog, subspecies) [taxon 9615], Sus scrofa (pig, species) [taxon 9823], Felis catus (cat, species) [taxon 9685], Homo sapiens (human, species) [taxon 9606], Cetacea (cetaceans, infraorder) [taxon 9721], Psittacidae (parrot, family) [taxon 9224]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12913347/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12913347/full.md

## References

11 references — full list in the complete paper: https://tomesphere.com/paper/PMC12913347/full.md

---
Source: https://tomesphere.com/paper/PMC12913347