Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

Shruti Singh Baghel; Yash Pratap Singh Rathore; Sushovan Jena; Anurag Pradhan; Amit Shukla; Arnav Bhavsar; Pawan Goyal

arXiv:2511.10615·cs.CV·November 14, 2025

Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals

Shruti Singh Baghel, Yash Pratap Singh Rathore, Sushovan Jena, Anurag Pradhan, Amit Shukla, Arnav Bhavsar, Pawan Goyal

PDF

Open Access

TL;DR

This paper evaluates lightweight vision-language models for blind and low-vision accessibility, introduces new assessment frameworks, and tests model performance on mobile devices to improve practical usability.

Contribution

It presents novel evaluation frameworks tailored for BLV accessibility and systematically assesses model size, prompt strategies, and deployment on mobile hardware.

Findings

01

SmolVLM2 models perform well on accessibility tasks

02

New frameworks effectively evaluate spatial and mobility information

03

Models can be deployed on smartphones with optimized precision

Abstract

Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for blind and low-vision (BLV) users who depend on detailed, context-aware descriptions. To study the effect of model size on accessibility-focused description quality, we evaluate SmolVLM2 variants with 500M and 2.2B parameters across two diverse datasets: AVCaps (outdoor), and Charades (indoor). In this work, we introduce two novel evaluation frameworks specifically designed for BLV accessibility assessment: the Multi-Context BLV Framework evaluating spatial orientation, social interaction, action events, and ambience contexts; and the Navigational Assistance Framework focusing on mobility-critical information. Additionally, we conduct a systematic evaluation of four different prompt design strategies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTactile and Sensory Interactions · Multimodal Machine Learning Applications · Subtitles and Audiovisual Media