Exploring the use of Generative AI to Support Automated Just-in-Time   Programming for Visual Scene Displays

Cynthia Zastudil; Christine Holyfield; Christine Kapp; Xandria; Crosland; Elizabeth Lorah; Tara Zimmerman; Stephen MacNeil

arXiv:2408.11137·cs.HC·August 22, 2024·2 cites

Exploring the use of Generative AI to Support Automated Just-in-Time Programming for Visual Scene Displays

Cynthia Zastudil, Christine Holyfield, Christine Kapp, Xandria, Crosland, Elizabeth Lorah, Tara Zimmerman, Stephen MacNeil

PDF

Open Access

TL;DR

This study explores using large multimodal models like GPT-4V to automatically generate relevant communication options for visual scene displays, aiming to reduce manual configuration and improve AAC device support.

Contribution

It demonstrates the feasibility of leveraging LMMs to create contextually relevant communication options, comparing them with expert-generated options in AAC applications.

Findings

01

LMM-generated options were often contextually relevant and similar to human-created options.

02

Experts found the AI-generated options promising but identified key questions for deployment.

03

Study provides initial evidence supporting AI integration in AAC devices.

Abstract

Millions of people worldwide rely on alternative and augmentative communication devices to communicate. Visual scene displays (VSDs) can enhance communication for these individuals by embedding communication options within contextualized images. However, existing VSDs often present default images that may lack relevance or require manual configuration, placing a significant burden on communication partners. In this study, we assess the feasibility of leveraging large multimodal models (LMM), such as GPT-4V, to automatically create communication options for VSDs. Communication options were sourced from a LMM and speech-language pathologists (SLPs) and AAC researchers (N=13) for evaluation through an expert assessment conducted by the SLPs and AAC researchers. We present the study's findings, supplemented by insights from semi-structured interviews (N=5) about SLP's and AAC researchers'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Interactive and Immersive Displays · Computer Graphics and Visualization Techniques