Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement   Learning Method

Washington Ramos; Michel Silva; Edson Araujo; Victor Moura; Keller; Oliveira; Leandro Soriano Marcolino; Erickson R. Nascimento

arXiv:2203.15778·cs.CV·March 30, 2022

Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning Method

Washington Ramos, Michel Silva, Edson Araujo, Victor Moura, Keller, Oliveira, Leandro Soriano Marcolino, Erickson R. Nascimento

PDF

1 Repo

TL;DR

This paper introduces a weakly-supervised reinforcement learning approach for video acceleration guided by text, aiming to produce concise, context-preserving summaries of instructional videos.

Contribution

It proposes a novel joint reward function and the VDAN+ model to effectively select frames and represent multimodal data, improving video summarization quality.

Findings

01

Achieves superior Precision, Recall, and F1 scores compared to baselines.

02

Effectively controls output video length without visual gaps.

03

Demonstrates robustness in accelerating instructional videos.

Abstract

The growth of videos in our digital age and the users' limited time raise the demand for processing untrimmed videos to produce shorter versions conveying the same information. Despite the remarkable progress that summarization methods have made, most of them can only select a few frames or skims, creating visual gaps and breaking the video context. This paper presents a novel weakly-supervised methodology based on a reinforcement learning formulation to accelerate instructional videos using text. A novel joint reward function guides our agent to select which frames to remove and reduce the input video to a target length without creating gaps in the final video. We also propose the Extended Visually-guided Document Attention Network (VDAN+), which can generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

verlab/TextDrivenVideoAcceleration_TPAMI_2022
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.