GenAssist: Making Image Generation Accessible
Mina Huh, Yi-Hao Peng, Amy Pavel

TL;DR
GenAssist is a system that enhances accessibility for blind and low vision creators by enabling them to verify, compare, and understand generated images through an integrated interface powered by advanced language and vision models.
Contribution
It introduces an accessible interface for text-to-image generation that helps BLV creators verify and interpret generated images, addressing a key accessibility challenge.
Findings
BLV creators found GenAssist simplifies image selection.
The system improves understanding of generated images.
Participants reported increased confidence in image creation.
Abstract
Blind and low vision (BLV) creators use images to communicate with sighted audiences. However, creating or retrieving images is challenging for BLV creators as it is difficult to use authoring tools or assess image search results. Thus, creators limit the types of images they create or recruit sighted collaborators. While text-to-image generation models let creators generate high-fidelity images based on a text description (i.e. prompt), it is difficult to assess the content and quality of generated images. We present GenAssist, a system to make text-to-image generation accessible. Using our interface, creators can verify whether generated image candidates followed the prompt, access additional details in the image not specified in the prompt, and skim a summary of similarities and differences between image candidates. To power the interface, GenAssist uses a large language model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
