Role of Audio in Audio-Visual Video Summarization

Ibrahim Shoer; Berkay Kopru; Engin Erzin

arXiv:2212.01040·cs.CV·December 5, 2022·1 cites

Role of Audio in Audio-Visual Video Summarization

Ibrahim Shoer, Berkay Kopru, Engin Erzin

PDF

Open Access

TL;DR

This paper introduces a novel audio-visual video summarization framework that leverages multiple fusion methods and explainability techniques, demonstrating improved performance on benchmark datasets.

Contribution

It proposes a new audio-visual fusion approach using GRU and attention networks, along with an explainability method based on audio-visual CCA to enhance understanding of audio's role.

Findings

01

Improved F1 and Kendall-tau scores on TVSum dataset.

02

Enhanced performance for positively correlated audio-visual content.

03

Effective use of CCA for explainability in video summarization.

Abstract

Video summarization attracts attention for efficient video representation, retrieval, and browsing to ease volume and traffic surge problems. Although video summarization mostly uses the visual channel for compaction, the benefits of audio-visual modeling appeared in recent literature. The information coming from the audio channel can be a result of audio-visual correlation in the video content. In this study, we propose a new audio-visual video summarization framework integrating four ways of audio-visual information fusion with GRU-based and attention-based networks. Furthermore, we investigate a new explainability methodology using audio-visual canonical correlation analysis (CCA) to better understand and explain the role of audio in the video summarization task. Experimental evaluations on the TVSum dataset attain F1 score and Kendall-tau score improvements for the audio-visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Video Analysis and Summarization · Digital Media Forensic Detection