WorldScribe: Towards Context-Aware Live Visual Descriptions

Ruei-Che Chang; Yuxuan Liu; Anhong Guo

arXiv:2408.06627·cs.HC·August 14, 2024

WorldScribe: Towards Context-Aware Live Visual Descriptions

Ruei-Che Chang, Yuxuan Liu, Anhong Guo

PDF

TL;DR

WorldScribe is a system that generates real-time, customizable visual descriptions for blind users, adapting to their context, environment, and sound conditions to improve independence and understanding.

Contribution

We introduce WorldScribe, a novel system that provides adaptive, context-aware live visual descriptions tailored to user intent, environment, and sound, supported by a multi-modal recognition pipeline.

Findings

01

Provides real-time, accurate descriptions

02

Adapts descriptions based on scene stability and noise levels

03

Enhances user understanding and independence

Abstract

Automated live visual descriptions can aid blind people in understanding their surroundings with autonomy and independence. However, providing descriptions that are rich, contextual, and just-in-time has been a long-standing challenge in accessibility. In this work, we develop WorldScribe, a system that generates automated live real-world visual descriptions that are customizable and adaptive to users' contexts: (i) WorldScribe's descriptions are tailored to users' intents and prioritized based on semantic relevance. (ii) WorldScribe is adaptive to visual contexts, e.g., providing consecutively succinct descriptions for dynamic scenes, while presenting longer and detailed ones for stable settings. (iii) WorldScribe is adaptive to sound contexts, e.g., increasing volume in noisy environments, or pausing when conversations start. Powered by a suite of vision, language, and sound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.