Scaling Up Video Summarization Pretraining with Large Language Models

Dawit Mureja Argaw; Seunghyun Yoon; Fabian Caba Heilbron; Hanieh; Deilamsalehy; Trung Bui; Zhaowen Wang; Franck Dernoncourt; Joon Son Chung

arXiv:2404.03398·cs.CV·April 5, 2024·1 cites

Scaling Up Video Summarization Pretraining with Large Language Models

Dawit Mureja Argaw, Seunghyun Yoon, Fabian Caba Heilbron, Hanieh, Deilamsalehy, Trung Bui, Zhaowen Wang, Franck Dernoncourt, Joon Son Chung

PDF

Open Access

TL;DR

This paper introduces a large-scale dataset and a new model for video summarization, leveraging large language models to improve generalization and set new state-of-the-art results.

Contribution

It presents an automated pipeline for creating a large video summarization dataset using LLMs and proposes a novel model that outperforms existing methods.

Findings

01

Achieved state-of-the-art results on multiple benchmarks.

02

Created a new benchmark dataset with professional annotations.

03

Demonstrated the effectiveness of LLMs in generating training data.

Abstract

Long-form video content constitutes a significant portion of internet traffic, making automated video summarization an essential research problem. However, existing video summarization datasets are notably limited in their size, constraining the effectiveness of state-of-the-art methods for generalization. Our work aims to overcome this limitation by capitalizing on the abundance of long-form videos with dense speech-to-video alignment and the remarkable capabilities of recent large language models (LLMs) in summarizing long text. We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset using LLMs as Oracle summarizers. By leveraging the generated dataset, we analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them. To facilitate further research in the field, our work also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Video Analysis and Summarization · Topic Modeling