# Learning visual to auditory sensory substitution reveals flexibility in image to sound mapping

**Authors:** Asa Kucinkas, Chrysa Retsa, Peter B. L. Meijer, Mark T. Wallace, Monica Gori, Micah M. Murray

PMC · DOI: 10.1038/s41539-025-00385-4 · NPJ Science of Learning · 2025-12-03

## TL;DR

This study shows that people can quickly learn to interpret images as sounds using different mapping strategies, suggesting flexibility in how visual information is translated into sound.

## Contribution

The study demonstrates that structured mappings, not fixed cross-modal correspondences, are key to learning sensory substitution devices.

## Key findings

- Both traditional and reversed spatial mappings were learned effectively within 30 minutes.
- Structured mappings outperformed arbitrary single-tone mappings in recognizing new stimuli.
- Mapping flexibility suggests SSDs can be customized for individual users and tasks.

## Abstract

Visual-to-auditory sensory substitution devices (SSDs) translate images to sounds. One SSD, The vOICe, translates a pixel’s vertical position into pitch and horizontal position into time. This mapping is primarily based on technical considerations for preserving image content in human-audible sounds without presupposing intuitiveness, although some literature also invokes crossmodal correspondences in perception, such as pitch for elevation. We investigated these presuppositions and the efficacy of learning a traditional algorithm (i.e., pitch indicating elevation and time indicating azimuth) versus a reversed algorithm (i.e., pitch indicating azimuth and time indicating elevation), or an arbitrary single-tone control mapping (i.e., each visual stimulus was represented by a single non-systematic pitch–time pairing without structured spatial correspondences). Sixty sighted adults participated with random assignment to the Traditional, Reversed, or Control groups. They completed learning and evaluation sessions using simplified black-and-white visual stimuli. Both the Traditional and Reversed groups learned mappings within 30 minutes and demonstrated successful recognition of novel stimuli, outperforming the Control group but not differing between them. Structured mappings facilitate SSD learning. Mapping pixel position onto spectral-temporal acoustic axes appears flexible, rather than anchored to cross-modal correspondences. These findings reveal how SSDs may be rendered bespoke across user, stimuli, and functionality levels.

## Full-text entities

- **Diseases:** SSD (MESH:C563928)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12783664/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12783664/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC12783664/full.md

---
Source: https://tomesphere.com/paper/PMC12783664