PyTorchVideo: A Deep Learning Library for Video Understanding
Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao, Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik,, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph, Feichtenhofer

TL;DR
PyTorchVideo is an open-source library built on PyTorch that offers modular, efficient tools for various video understanding tasks, supporting real-time inference and hardware acceleration across multiple frameworks.
Contribution
It provides a comprehensive, reproducible, and hardware-accelerated toolkit for video understanding, integrating multiple models and data processing components in one library.
Findings
Achieves state-of-the-art performance on key benchmarks.
Supports real-time inference on mobile devices.
Compatible with multiple training frameworks.
Abstract
We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing. The library covers a full stack of video understanding tools including multimodal data loading, transformations, and models that reproduce state-of-the-art performance. PyTorchVideo further supports hardware acceleration that enables real-time inference on mobile devices. The library is based on PyTorch and can be used by any training framework; for example, PyTorchLightning, PySlowFast, or Classy Vision. PyTorchVideo is available at https://pytorchvideo.org/
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
