# Listen to the Image

**Authors:** Di Hu, Dong Wang, Xuelong Li, Feiping Nie, Qi Wang

arXiv: 1904.09115 · 2019-04-22

## TL;DR

This paper introduces machine models for evaluating visual-to-auditory translation schemes, aiming to improve assessment efficiency and support visual perception enhancement for the blind.

## Contribution

It proposes two cross-modal perception models for blind individuals and novel optimization strategies, enabling machine-based evaluation of encoding schemes.

## Key findings

- Machine models show high consistency with human assessments.
- Machine evaluation can accelerate optimization and reduce costs.
- Supports improved visual perception for the blind.

## Abstract

Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed to evaluate different encoding schemes. In contrast to the toilsome human-based assessment, we argue that machine model can be also developed for evaluation, and more efficient. To this end, we firstly propose two distinct cross-modal perception model w.r.t. the late-blind and congenitally-blind cases, which aim to generate concrete visual contents based on the translated sound. To validate the functionality of proposed models, two novel optimization strategies w.r.t. the primary encoding scheme are presented. Further, we conduct sets of human-based experiments to evaluate and compare them with the conducted machine-based assessments in the cross-modal generation task. Their highly consistent results w.r.t. different encoding schemes indicate that using machine model to accelerate optimization evaluation and reduce experimental cost is feasible to some extent, which could dramatically promote the upgrading of encoding scheme then help the blind to improve their visual perception ability.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.09115/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1904.09115/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/1904.09115/full.md

---
Source: https://tomesphere.com/paper/1904.09115