WorldScribe: Towards Context-Aware Live Visual Descriptions
Ruei-Che Chang, Yuxuan Liu, Anhong Guo

TL;DR
WorldScribe is a system that generates real-time, customizable visual descriptions for blind users, adapting to their context, environment, and sound conditions to improve independence and understanding.
Contribution
We introduce WorldScribe, a novel system that provides adaptive, context-aware live visual descriptions tailored to user intent, environment, and sound, supported by a multi-modal recognition pipeline.
Findings
Provides real-time, accurate descriptions
Adapts descriptions based on scene stability and noise levels
Enhances user understanding and independence
Abstract
Automated live visual descriptions can aid blind people in understanding their surroundings with autonomy and independence. However, providing descriptions that are rich, contextual, and just-in-time has been a long-standing challenge in accessibility. In this work, we develop WorldScribe, a system that generates automated live real-world visual descriptions that are customizable and adaptive to users' contexts: (i) WorldScribe's descriptions are tailored to users' intents and prioritized based on semantic relevance. (ii) WorldScribe is adaptive to visual contexts, e.g., providing consecutively succinct descriptions for dynamic scenes, while presenting longer and detailed ones for stable settings. (iii) WorldScribe is adaptive to sound contexts, e.g., increasing volume in noisy environments, or pausing when conversations start. Powered by a suite of vision, language, and sound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
