Images Speak Volumes: User-Centric Assessment of Image Generation for   Accessible Communication

Miriam Ansch\"utz; Tringa Sylaj; Georg Groh

arXiv:2410.03430·cs.CV·October 7, 2024

Images Speak Volumes: User-Centric Assessment of Image Generation for Accessible Communication

Miriam Ansch\"utz, Tringa Sylaj, Georg Groh

PDF

Open Access 1 Repo 1 Video

TL;DR

This study evaluates the potential of text-to-image models to generate customizable, accessible images for explanatory texts, highlighting current limitations and the need for human oversight to meet user needs.

Contribution

It benchmarks multiple image generation models and assesses their suitability for creating accessible images, providing insights into their performance and user preferences.

Findings

01

Some models perform remarkably well

02

None are ready for large-scale autonomous use

03

Human supervision remains essential

Abstract

Explanatory images play a pivotal role in accessible and easy-to-read (E2R) texts. However, the images available in online databases are not tailored toward the respective texts, and the creation of customized images is expensive. In this large-scale study, we investigated whether text-to-image generation models can close this gap by providing customizable images quickly and easily. We benchmarked seven, four open- and three closed-source, image generation models and provide an extensive evaluation of the resulting images. In addition, we performed a user study with people from the E2R target group to examine whether the images met their requirements. We find that some of the models show remarkable performance, but none of the models are ready to be used at a larger scale without human supervision. Our research is an important step toward facilitating the creation of accessible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MiriUll/Image-Generation-for-Accessible-Communication
pytorchOfficial

Videos

Images Speak Volumes: User-Centric Assessment of Image Generation for Accessible Communication· underline

Taxonomy

TopicsSubtitles and Audiovisual Media