NarrationBot and InfoBot: A Hybrid System for Automated Video   Description

Shasta Ihorn; Yue-Ting Siu; Aditya Bodi; Lothar Narins; Jose M.; Castanon; Yash Kant; Abhishek Das; Ilmi Yoon; Pooyan Fazli

arXiv:2111.03994·cs.HC·January 12, 2022

NarrationBot and InfoBot: A Hybrid System for Automated Video Description

Shasta Ihorn, Yue-Ting Siu, Aditya Bodi, Lothar Narins, Jose M., Castanon, Yash Kant, Abhishek Das, Ilmi Yoon, Pooyan Fazli

PDF

Open Access

TL;DR

NarrationBot and InfoBot are a hybrid system that automatically generates and enhances video descriptions, significantly improving accessibility for blind and low vision users and enabling more efficient video content engagement.

Contribution

The paper introduces a novel hybrid system combining automatic video description generation and interactive querying, enhancing accessibility and user experience for visually impaired viewers.

Findings

01

System improved user comprehension and enjoyment

02

No significant difference between autogenerated and human-revised descriptions

03

High user enthusiasm for the system

Abstract

Video accessibility is crucial for blind and low vision users for equitable engagements in education, employment, and entertainment. Despite the availability of professional and amateur services and tools, most human-generated descriptions are expensive and time consuming. Moreover, the rate of human-generated descriptions cannot match the speed of video production. To overcome the increasing gaps in video accessibility, we developed a hybrid system of two tools to 1) automatically generate descriptions for videos and 2) provide answers or additional descriptions in response to user queries on a video. Results from a mixed-methods study with 26 blind and low vision individuals show that our system significantly improved user comprehension and enjoyment of selected videos when both tools were used in tandem. In addition, participants reported no significant difference in their ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings