TL;DR
This paper introduces a deep autoencoder-based visual-to-auditory sensory substitution method that reduces training time and improves perceptual conveyance, aiding visually impaired individuals in faster adaptation.
Contribution
It presents a novel deep recurrent autoencoder approach for image-to-sound conversion, integrating computational hearing models to shorten training and enhance perceptual clarity.
Findings
Achieved above-chance accuracy after a few hours of training
Demonstrated viability of shortened audio signals for sensory substitution
Validated approach through experiments with blindfolded subjects
Abstract
Tens of millions of people live blind, and their number is ever increasing. Visual-to-auditory sensory substitution (SS) encompasses a family of cheap, generic solutions to assist the visually impaired by conveying visual information through sound. The required SS training is lengthy: months of effort is necessary to reach a practical level of adaptation. There are two reasons for the tedious training process: the elongated substituting audio signal, and the disregard for the compressive characteristics of the human hearing system. To overcome these obstacles, we developed a novel class of SS methods, by training deep recurrent autoencoders for image-to-sound conversion. We successfully trained deep learning models on different datasets to execute visual-to-auditory stimulus conversion. By constraining the visual space, we demonstrated the viability of shortened substituting audio…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
