Multimodal Surrogates for Video Browsing
Wei Ding, Gary Marchionini, Dagobert Soergel

TL;DR
This paper investigates different types of video surrogates, finding that combined visual and verbal representations enhance user comprehension and efficiency in video browsing.
Contribution
It introduces and evaluates combined visual and verbal surrogates, demonstrating their superiority over single-modality surrogates in user comprehension and processing time.
Findings
Combined surrogates improve comprehension
Verbal and visual reinforcement reduces processing time
Identifies helpful image features for users
Abstract
Three types of video surrogates - visual (keyframes), verbal (keywords/phrases), and combination of the two - were designed and studied in a qualitative investigation of user cognitive processes. The results favor the combined surrogates in which verbal information and images reinforce each other, lead to better comprehension, and may actually require less processing time. The results also highlight image features users found most helpful. These findings will inform the interface design and video representation for video retrieval and browsing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
