# Everything, altogether, all at once: Addressing data challenges when measuring speech intelligibility through entropy scores

**Authors:** Jose Manuel Rivera Espejo, Sven De Maeyer, Steven Gillis

PMC · DOI: 10.3758/s13428-024-02457-6 · Behavior Research Methods · 2024-07-24

## TL;DR

This paper shows how a Bayesian statistical model can better handle complex data challenges when measuring speech intelligibility compared to traditional methods.

## Contribution

The paper demonstrates the effectiveness of the Bayesian beta-proportion GLLAMM model for analyzing entropy scores in speech intelligibility.

## Key findings

- The beta-proportion GLLAMM outperformed the normal linear mixed model in predicting speech intelligibility.
- The model successfully estimated latent intelligibility from entropy scores.
- The model enabled exploration of hypotheses about speaker-related factors affecting intelligibility.

## Abstract

When investigating unobservable, complex traits, data collection and aggregation processes can introduce distinctive features to the data such as boundedness, measurement error, clustering, outliers, and heteroscedasticity. Failure to collectively address these features can result in statistical challenges that prevent the investigation of hypotheses regarding these traits. This study aimed to demonstrate the efficacy of the Bayesian beta-proportion generalized linear latent and mixed model (beta-proportion GLLAMM) (Rabe-Hesketh et al., Psychometrika, 69(2), 167–90, 2004a, Journal of Econometrics, 128(2), 301–23, 2004c, 2004b; Skrondal and Rabe-Hesketh 2004) in handling data features when exploring research hypotheses concerning speech intelligibility. To achieve this objective, the study reexamined data from transcriptions of spontaneous speech samples initially collected by Boonen et al. (Journal of Child Language, 50(1), 78–103, 2023). The data were aggregated into entropy scores. The research compared the prediction accuracy of the beta-proportion GLLAMM with the normal linear mixed model (LMM) (Holmes et al., 2019) and investigated its capacity to estimate a latent intelligibility from entropy scores. The study also illustrated how hypotheses concerning the impact of speaker-related factors on intelligibility can be explored with the proposed model. The beta-proportion GLLAMM was not free of challenges; its implementation required formulating assumptions about the data-generating process and knowledge of probabilistic programming languages, both central to Bayesian methods. Nevertheless, results indicated the superiority of the model in predicting empirical phenomena over the normal LMM, and its ability to quantify a latent potential intelligibility. Additionally, the proposed model facilitated the exploration of hypotheses concerning speaker-related factors and intelligibility. Ultimately, this research has implications for researchers and data analysts interested in quantitatively measuring intricate, unobservable constructs while accurately predicting the empirical phenomena.

## Full-text entities

- **Diseases:** LMM (MESH:D004195), hearing difficulties (MESH:D034381), HI (MESH:C538424)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11362487/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11362487/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC11362487/full.md

---
Source: https://tomesphere.com/paper/PMC11362487