Training a Large Video Model on a Single Machine in a Day
Yue Zhao, Philipp Kr\"ahenb\"uhl

TL;DR
This paper presents a highly efficient method to train large-scale video models on a single machine with eight consumer GPUs within a day, significantly reducing computational requirements while improving accuracy.
Contribution
It introduces optimized techniques addressing IO, CPU, and GPU bottlenecks, enabling state-of-the-art video model training on limited hardware in a short time.
Findings
Achieves higher accuracy with one-eighth of the computation of prior methods.
Trains a large video model on a single machine within a day.
Provides an open-source codebase for reproducibility.
Abstract
Videos are big, complex to pre-process, and slow to train on. State-of-the-art large-scale video models are trained on clusters of 32 or more GPUs for several days. As a consequence, academia largely ceded the training of large video models to industry. In this paper, we show how to still train a state-of-the-art video model on a single machine with eight consumer-grade GPUs in a day. We identify three bottlenecks, IO, CPU, and GPU computation, and optimize each. The result is a highly efficient video training pipeline. For comparable architectures, our pipeline achieves higher accuracies with of the computation compared to prior work. Code is available at https://github.com/zhaoyue-zephyrus/AVION.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
