Multimodal Surrogates for Video Browsing

Wei Ding; Gary Marchionini; Dagobert Soergel

arXiv:cs/9902019·cs.DL·May 23, 2007

Multimodal Surrogates for Video Browsing

Wei Ding, Gary Marchionini, Dagobert Soergel

PDF

Open Access

TL;DR

This paper investigates different types of video surrogates, finding that combined visual and verbal representations enhance user comprehension and efficiency in video browsing.

Contribution

It introduces and evaluates combined visual and verbal surrogates, demonstrating their superiority over single-modality surrogates in user comprehension and processing time.

Findings

01

Combined surrogates improve comprehension

02

Verbal and visual reinforcement reduces processing time

03

Identifies helpful image features for users

Abstract

Three types of video surrogates - visual (keyframes), verbal (keywords/phrases), and combination of the two - were designed and studied in a qualitative investigation of user cognitive processes. The results favor the combined surrogates in which verbal information and images reinforce each other, lead to better comprehension, and may actually require less processing time. The results also highlight image features users found most helpful. These findings will inform the interface design and video representation for video retrieval and browsing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques