Context Matters for Image Descriptions for Accessibility: Challenges for Referenceless Evaluation Metrics
Elisa Kreiss, Cynthia Bennett, Shayan Hooshmand, Eric Zelikman,, Meredith Ringel Morris, Christopher Potts

TL;DR
This paper highlights the importance of context in image descriptions for accessibility, critiques current referenceless metrics for ignoring context, and proposes a context-aware version of CLIPScore to better evaluate descriptions for blind and low vision users.
Contribution
It identifies the limitations of existing referenceless metrics due to lack of context consideration and introduces a contextual CLIPScore to improve evaluation for accessibility.
Findings
Current metrics do not align with BLV users' needs.
Contextual information significantly impacts description quality.
Context-aware CLIPScore shows promise in better evaluation.
Abstract
Few images on the Web receive alt-text descriptions that would make them accessible to blind and low vision (BLV) users. Image-based NLG systems have progressed to the point where they can begin to address this persistent societal problem, but these systems will not be fully successful unless we evaluate them on metrics that guide their development correctly. Here, we argue against current referenceless metrics -- those that don't rely on human-generated ground-truth descriptions -- on the grounds that they do not align with the needs of BLV users. The fundamental shortcoming of these metrics is that they do not take context into account, whereas contextual information is highly valued by BLV users. To substantiate these claims, we present a study with BLV participants who rated descriptions along a variety of dimensions. An in-depth analysis reveals that the lack of context-awareness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Text Readability and Simplification · Digital Accessibility for Disabilities
MethodsALIGN
