Training a Large Video Model on a Single Machine in a Day

Yue Zhao; Philipp Kr\"ahenb\"uhl

arXiv:2309.16669·cs.CV·September 29, 2023·5 cites

Training a Large Video Model on a Single Machine in a Day

Yue Zhao, Philipp Kr\"ahenb\"uhl

PDF

Open Access 1 Repo

TL;DR

This paper presents a highly efficient method to train large-scale video models on a single machine with eight consumer GPUs within a day, significantly reducing computational requirements while improving accuracy.

Contribution

It introduces optimized techniques addressing IO, CPU, and GPU bottlenecks, enabling state-of-the-art video model training on limited hardware in a short time.

Findings

01

Achieves higher accuracy with one-eighth of the computation of prior methods.

02

Trains a large video model on a single machine within a day.

03

Provides an open-source codebase for reproducibility.

Abstract

Videos are big, complex to pre-process, and slow to train on. State-of-the-art large-scale video models are trained on clusters of 32 or more GPUs for several days. As a consequence, academia largely ceded the training of large video models to industry. In this paper, we show how to still train a state-of-the-art video model on a single machine with eight consumer-grade GPUs in a day. We identify three bottlenecks, IO, CPU, and GPU computation, and optimize each. The result is a highly efficient video training pipeline. For comparable architectures, our pipeline achieves higher accuracies with $\frac{1}{8}$ of the computation compared to prior work. Code is available at https://github.com/zhaoyue-zephyrus/AVION.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaoyue-zephyrus/avion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications