Context-Aware Image Descriptions for Web Accessibility
Ananya Gubbi Mohanbabu, Amy Pavel

TL;DR
This paper presents a Chrome Extension that enhances image descriptions for blind and low vision users by incorporating webpage context, significantly improving description quality and relevance based on user feedback.
Contribution
The study introduces a novel method for generating context-aware image descriptions using GPT-4V, integrating webpage context to improve accessibility for BLV users.
Findings
Participants preferred context-aware descriptions over context-free ones.
Context-aware descriptions scored higher in quality, imaginability, relevance, and plausibility.
Participants expressed interest in using context-aware descriptions across various online platforms.
Abstract
Blind and low vision (BLV) internet users access images on the web via text descriptions. New vision-to-language models such as GPT-V, Gemini, and LLaVa can now provide detailed image descriptions on-demand. While prior research and guidelines state that BLV audiences' information preferences depend on the context of the image, existing tools for accessing vision-to-language models provide only context-free image descriptions by generating descriptions for the image alone without considering the surrounding webpage context. To explore how to integrate image context into image descriptions, we designed a Chrome Extension that automatically extracts webpage context to inform GPT-4V-generated image descriptions. We gained feedback from 12 BLV participants in a user study comparing typical context-free image descriptions to context-aware image descriptions. We then further evaluated our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
