@Bench: Benchmarking Vision-Language Models for Human-centered Assistive   Technology

Xin Jiang; Junwei Zheng; Ruiping Liu; Jiahang Li; Jiaming Zhang; Sven; Matthiesen; and Rainer Stiefelhagen

arXiv:2409.14215·cs.CV·November 26, 2024

@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology

Xin Jiang, Junwei Zheng, Ruiping Liu, Jiahang Li, Jiaming Zhang, Sven, Matthiesen, and Rainer Stiefelhagen

PDF

Open Access

TL;DR

This paper introduces @Bench, a comprehensive benchmark for evaluating vision-language models in assistive technology for visually impaired people, along with a new multi-task model that improves assistance capabilities.

Contribution

The paper presents a novel benchmark (@Bench) for human-centered assistive tasks and a new multi-task model (@Model) that addresses multiple vision-language tasks simultaneously.

Findings

01

The benchmark covers five key assistive tasks.

02

The proposed model outperforms existing methods across tasks.

03

Experiments demonstrate the model's effectiveness and generalizability.

Abstract

As Vision-Language Models (VLMs) advance, human-centered Assistive Technologies (ATs) for helping People with Visual Impairments (PVIs) are evolving into generalists, capable of performing multiple tasks simultaneously. However, benchmarking VLMs for ATs remains under-explored. To bridge this gap, we first create a novel AT benchmark (@Bench). Guided by a pre-design user study with PVIs, our benchmark includes the five most crucial vision-language tasks: Panoptic Segmentation, Depth Estimation, Optical Character Recognition (OCR), Image Captioning, and Visual Question Answering (VQA). Besides, we propose a novel AT model (@Model) that addresses all tasks simultaneously and can be expanded to more assistive functions for helping PVIs. Our framework exhibits outstanding performance across tasks by integrating multi-modal information, and it offers PVIs a more comprehensive assistance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAssistive Technology in Communication and Mobility · Digital Accessibility for Disabilities · Smart Cities and Technologies