Efficient Large Scale Video Classification
Balakrishnan Varadarajan, George Toderici, Sudheendra, Vijayanarasimhan, Apostol Natsev

TL;DR
This paper introduces scalable methods for large-scale video classification by leveraging image-based CNN features and efficient models like mixture of experts and LSTMs, enabling processing of millions of videos with reduced computational costs.
Contribution
It proposes novel scalable approaches for video classification that bypass intensive frame-based CNN training by using image-derived features and efficient models for large datasets.
Findings
Achieved state-of-the-art results on Sports-1M dataset.
Successfully scaled to 12 million videos with 150,000 labels.
Demonstrated low computational cost for large-scale video classification.
Abstract
Video classification has advanced tremendously over the recent years. A large part of the improvements in video classification had to do with the work done by the image classification community and the use of deep convolutional networks (CNNs) which produce competitive results with hand- crafted motion features. These networks were adapted to use video frames in various ways and have yielded state of the art classification results. We present two methods that build on this work, and scale it up to work with millions of videos and hundreds of thousands of classes while maintaining a low computational cost. In the context of large scale video processing, training CNNs on video frames is extremely time consuming, due to the large number of frames involved. We propose to avoid this problem by training CNNs on either YouTube thumbnails or Flickr images, and then using these networks' outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications
