AMAVA: Adaptive Motion-Aware Video-to-Audio Framework for Visually-Impaired Assistance

Benjamin Klein; Kazi Ruslan Rahman; and Sanchita Ghose

arXiv:2604.23909·cs.CV·April 28, 2026

AMAVA: Adaptive Motion-Aware Video-to-Audio Framework for Visually-Impaired Assistance

Benjamin Klein, Kazi Ruslan Rahman, and Sanchita Ghose

PDF

TL;DR

AMAVA is a real-time video-to-audio system designed to assist visually impaired users by providing contextually relevant sound cues and descriptions, improving environmental awareness and safety.

Contribution

It introduces a motion-aware, AI-driven framework that dynamically switches between scene descriptions and hazard alerts to enhance navigation for the visually impaired.

Findings

01

User confidence increased significantly with AMAVA.

02

AMAVA effectively distinguishes between static and dynamic scenes.

03

The system reduces auditory clutter and latency in real-time navigation.

Abstract

Navigational aids for blind and low vision individuals struggle conveying dynamic real-world environments, leading to cognitive overload from continuous, undifferentiated feedback. We present AMAVA, a novel real-time video-to-audio framework that converts mobile device video into contextually relevant sound effects or text-to-speech descriptions. We propose a motion-aware pipeline using a lightweight AI classification model to distinguish between low and high-movement scenes followed by a real-time text-to-audio synthesis pipeline to enhance environmental perception more efficiently. In static environments, AMAVA generates spoken audio scene descriptions for situational awareness. In high-movement situations, it prioritizes safety by delivering sound cues, such as spoken hazard alerts and environmental sound effects. These audio outputs are produced by a decoder-only transformer-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.