Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes
Chitralekha Gupta, Jing Peng, Ashwin Ram, Shreyas Sridhar, Christophe Jouffrais, Suranga Nanayakkara

TL;DR
This paper introduces Scene2Audio, a generative audio framework that creates engaging nonverbal sounds to help blind and low-vision users experience scenic landscapes more vividly, surpassing traditional descriptive tools.
Contribution
It presents a novel generative model-based approach for scene audio synthesis that enhances aesthetic and immersive outdoor scene perception for BLV users.
Findings
Combining Scene2Audio sounds with speech improves scene imagination.
User studies show Scene2Audio enhances outdoor scene experiences.
The framework bridges visual and auditory perception beyond descriptive aids.
Abstract
Current scene perception tools for Blind and Low Vision (BLV) individuals rely on spoken descriptions but lack engaging representations of visually pleasing distant environmental landscapes (Vista spaces). Our proposed Scene2Audio framework generates comprehensible and enjoyable nonverbal audio using generative models informed by psychoacoustics, and principles of scene audio composition. Through a user study with 11 BLV participants, we found that combining the Scene2Audio sounds with speech creates a better experience than speech alone, as the sound effects complement the speech making the scene easier to imagine. A mobile app "in-the-wild" study with 7 BLV users for more than a week further showed the potential of Scene2Audio in enhancing outdoor scene experiences. Our work bridges the gap between visual and auditory scene perception by moving beyond purely descriptive aids,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
