VisAug: Facilitating Speech-Rich Web Video Navigation and Engagement with Auto-Generated Visual Augmentations

Baoquan Zhao; Xiaofan Ma; Qianshi Pang; Ruomei Wang; Fan Zhou; Shujin Lin

arXiv:2508.03410·cs.MM·August 6, 2025

VisAug: Facilitating Speech-Rich Web Video Navigation and Engagement with Auto-Generated Visual Augmentations

Baoquan Zhao, Xiaofan Ma, Qianshi Pang, Ruomei Wang, Fan Zhou, Shujin Lin

PDF

TL;DR

VisAug is an innovative system that automatically creates visual augmentations from speech content to improve navigation and engagement in speech-rich videos, addressing limitations of existing visual-based summarization methods.

Contribution

The paper introduces VisAug, a novel system that enhances speech-rich video interaction by generating informative visual augmentations from audio content.

Findings

01

Potential to significantly improve video content engagement

02

Enhances navigation in speech-rich videos

03

Addresses limitations of visual-only summarization

Abstract

The widespread adoption of digital technology has ushered in a new era of digital transformation across all aspects of our lives. Online learning, social, and work activities, such as distance education, videoconferencing, interviews, and talks, have led to a dramatic increase in speech-rich video content. In contrast to other video types, such as surveillance footage, which typically contain abundant visual cues, speech-rich videos convey most of their meaningful information through the audio channel. This poses challenges for improving content consumption using existing visual-based video summarization, navigation, and exploration systems. In this paper, we present VisAug, a novel interactive system designed to enhance speech-rich video navigation and engagement by automatically generating informative and expressive visual augmentations based on the speech content of videos. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.