# Exploring the Guessing‐Game Experimental Paradigm: Inferences From Closed‐ Versus Open‐Ended Semantic Space

**Authors:** Svetlana Kuleshova, Aleksandra Ćwiek, Stefan Hartmann, Michael Pleyer, Marta Sibierska, Marek Placiński, Johan Blomberg, Przemysław Żywiczyński, Sławomir Wacewicz

PMC · DOI: 10.1111/cogs.70199 · Cognitive Science · 2026-03-23

## TL;DR

This study compares how different response formats and evaluation methods affect understanding of novel signals, showing that open-ended responses reveal broader semantic patterns than multiple-choice answers.

## Contribution

The paper introduces a systematic comparison of evaluation methods for open-ended responses in signal comprehension experiments, revealing distinct semantic patterns.

## Key findings

- Success in signal comprehension is more influenced by signal properties than participant abilities.
- Open-ended responses show broader thematic connections when evaluated with computational semantic similarity.
- Participants reliably distinguish broad categories but rarely identify specific concepts.

## Abstract

How we measure success in signal comprehension experiments fundamentally shapes our conclusions. Two recent studies have demonstrated that humans can guess the meanings of novel vocalizations and ape gestures above chance when selecting from limited alternatives. We replicated both experiments using open‐ended responses instead of multiple choice. For the vocalization data, where participants provided single‐word or short‐phrase responses, we systematically compared three evaluation methods applied to the same responses: exact matching, graded similarity ratings, and computational semantic similarity. For the gesture data, we applied graded similarity ratings. Each evaluation method revealed a different semantic landscape. Participants’ success was very low when measured by exact matching, moderate by similarity ratings, and substantially greater by computational measures, which capture broader thematic connections. Despite these differences, a consistent pattern emerged across both datasets and all evaluation methods: success was determined primarily by properties of the signals (their semantic category and degree of transparency) rather than individual participant abilities. Participants often reliably distinguished broad categories (actions vs. objects, animals vs. artifacts) but rarely identified specific concepts—and these distinct patterns only became visible through a combination of evaluation methods. In sum, our results partly align with the original studies yet also diverge in ways conducive to different conclusions about naïve humans’ ability to understand novel vocalizations or ape gestures. We show that closed‐ versus open‐ended response formats, and different evaluation scales, function as complementary research tools rather than competing approaches. Each reveals different aspects of how humans navigate semantic space when interpreting novel signals. Experimental and evaluation designs are, therefore, not a technical detail but a theoretical choice about which semantic relationships we seek to expose.

## Full-text entities

- **Diseases:** hearing disabilities (MESH:D006311)
- **Chemicals:** water (MESH:D014867)
- **Species:** Equus caballus (domestic horse, species) [taxon 9796], Pan paniscus (bonobo, species) [taxon 9597], Homo sapiens (human, species) [taxon 9606], Alocasia macrorrhizos (ape, species) [taxon 4456], Pan troglodytes (chimpanzee, species) [taxon 9598]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13007489/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13007489/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC13007489/full.md

---
Source: https://tomesphere.com/paper/PMC13007489