Say It My Way: Exploring Control in Conversational Visual Question Answering with Blind Users
Farnaz Zamiri Zeraati, Yang Trista Cao, Yuehan Qiao, Hal Daum\'e III, Hernisa Kacorri

TL;DR
This study explores how blind users customize interactions with a conversational visual question answering system, revealing the importance of control and prompting techniques to improve accessibility and user experience.
Contribution
It introduces a detailed analysis of user-driven customization techniques in VQA for blind users and provides a new dataset to inform accessible interaction design.
Findings
Participants often used lengthy multi-turn interactions.
Prompt engineering helped users overcome system limitations.
The study offers insights for designing more accessible VQA systems.
Abstract
Prompting and steering techniques are well established in general-purpose generative AI, yet assistive visual question answering (VQA) tools for blind users still follow rigid interaction patterns with limited opportunities for customization. User control can be helpful when system responses are misaligned with their goals and contexts, a gap that becomes especially consequential for blind users that may rely on these systems for access. We invite 11 blind users to customize their interactions with a real-world conversational VQA system. Drawing on 418 interactions, reflections, and post-study interviews, we analyze prompting-based techniques participants adopted, including those introduced in the study and those developed independently in real-world settings. VQA interactions were often lengthy: participants averaged 3 turns, sometimes up to 21, with input text typically tenfold…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Tactile and Sensory Interactions · Social Robot Interaction and HRI
