# Comparing effect latencies in the visual world paradigm: Monte Carlo simulations to assess resampling-based procedures

**Authors:** Serge Minor

PMC · DOI: 10.3758/s13428-025-02934-6 · Behavior Research Methods · 2026-02-23

## TL;DR

This paper uses simulations to evaluate how well resampling methods can detect differences in effect latencies in visual world paradigm experiments.

## Contribution

The study introduces new latency measures with effect size thresholds and evaluates their performance in resampling-based statistical tests.

## Key findings

- Permutation tests and bootstrapped percentile CIs showed high power without inflating Type I error rates.
- Applying an effect size threshold increased statistical power for latency estimation.
- Resampling by participant was robust to cross-subject variability, unlike bootstrapping within participants.

## Abstract

In a series of Monte Carlo simulation studies, we evaluated the power and Type I error rates of resampling-based procedures for comparing effect latencies between groups in the visual world paradigm (VWP). Resampling-based methods, while versatile, are known to fail in certain cases. Therefore, validation of such methods through simulation is crucial. We compared permutation- and bootstrapping-based tests combined with different methods for measuring effect latency while manipulating sample size and true effect size. Alongside previously used latency measures, we tested new measures involving the application of an effect size threshold. Simulations were based on existing VWP datasets representing different effect types (preferential looks triggered by lexical vs. grammatical cues, cohort competitor effects in word recognition) and data collection methods (infrared- vs. webcam-based eye tracking). A total of 156,000 simulations were conducted across five studies, involving 548 million resampled datasets. The main findings are as follows: (1) With sufficient sample sizes, tests were effective in detecting latency differences of 200–300 ms in sentence processing tasks, and as small as 100 ms in word recognition. (2) The permutation test and bootstrapped percentile CIs exhibited the highest overall power without inflation of Type I error rates. (3) Applying an effect size threshold in latency estimation led to consistent increases in statistical power. (4) Resampling by participant was robust to increases in cross-subject variability;in contrast, bootstrapping within participants and time bins led to elevated Type I error rates. Based on these results, we offer recommendations for using non-parametric resampling-based procedures to compare group latencies in VWP experiments.

The online version contains supplementary material available at 10.3758/s13428-025-02934-6.

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382), VWP (MESH:D014786)
- **Chemicals:** FEM (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12929238/full.md

## Figures

38 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12929238/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC12929238/full.md

---
Source: https://tomesphere.com/paper/PMC12929238