PGT: A Progressive Method for Training Models on Long Videos

Bo Pang; Gao Peng; Yizhuo Li; Cewu Lu

arXiv:2103.11313·cs.CV·March 23, 2021·1 cites

PGT: A Progressive Method for Training Models on Long Videos

Bo Pang, Gao Peng, Yizhuo Li, Cewu Lu

PDF

Open Access 1 Repo

TL;DR

The paper introduces PGT, a progressive training method that enables end-to-end training of long videos by propagating information sequentially, overcoming computational limitations of traditional clip-based approaches.

Contribution

It proposes a novel progressive training approach inspired by NLP techniques, allowing effective end-to-end training of long videos with limited resources.

Findings

01

Improves SlowOnly network by 3.7 mAP on Charades

02

Increases top-1 accuracy by 1.9 on Kinetics

03

Achieves significant performance gains with negligible overhead

Abstract

Convolutional video models have an order of magnitude larger computational complexity than their counterpart image-level models. Constrained by computational resources, there is no model or training method that can train long video sequences end-to-end. Currently, the main-stream method is to split a raw video into clips, leading to incomplete fragmentary temporal information flow. Inspired by natural language processing techniques dealing with long sentences, we propose to treat videos as serial fragments satisfying Markov property, and train it as a whole by progressively propagating information through the temporal dimension in multiple steps. This progressive training (PGT) method is able to train long videos end-to-end with limited resources and ensures the effective transmission of information. As a general and robust training method, we empirically demonstrate that it yields…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BoPang1996/PGT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging