VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments
Bufang Yang, Lixing He, Kaiwei Liu, Zhenyu Yan

TL;DR
VIAssist is a system that adapts multi-modal large language models to assist visually impaired users by identifying relevant images and providing detailed, reliable answers to visual questions, improving understanding and interaction.
Contribution
The paper introduces VIAssist, a novel approach to adapt MLLMs for VI users, enabling better visual understanding and question answering despite image limitations.
Findings
VIAssist outperforms baseline with +0.21 BERTScore
VIAssist outperforms baseline with +0.31 ROUGE score
Provides detailed actions and reliable answers for VI users
Abstract
Individuals with visual impairments, encompassing both partial and total difficulties in visual perception, are referred to as visually impaired (VI) people. An estimated 2.2 billion individuals worldwide are affected by visual impairments. Recent advancements in multi-modal large language models (MLLMs) have showcased their extraordinary capabilities across various domains. It is desirable to help VI individuals with MLLMs' great capabilities of visual understanding and reasoning. However, it is challenging for VI people to use MLLMs due to the difficulties in capturing the desirable images to fulfill their daily requests. For example, the target object is not fully or partially placed in the image. This paper explores how to leverage MLLMs for VI individuals to provide visual-question answers. VIAssist can identify undesired images and provide detailed actions. Finally, VIAssist can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Accessibility for Disabilities · Text Readability and Simplification · Multimodal Machine Learning Applications
