Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor
Aaron Chadha, Alhabib Abbas, Yiannis Andreopoulos

TL;DR
This paper presents a fast, efficient video classification method using CNNs that leverage compressed video bitstream information, significantly reducing computational costs while maintaining competitive accuracy.
Contribution
It introduces a novel approach that uses macroblock motion vectors and texture data from compressed videos for classification, achieving high speed and cost efficiency.
Findings
Over 977 times faster MV extraction than GPU optical flow
Selective decoding up to 12 times faster than full-frame decoding
Inference 5 to 49 times cheaper than existing methods
Abstract
We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video codecs divide the input frames into macroblocks (MBs). We demonstrate that selective access to MB motion vector (MV) information within compressed video bitstreams can also provide for selective, motion-adaptive, MB pixel decoding (a.k.a., MB texture decoding). This in turn allows for the derivation of spatio-temporal video activity regions at extremely high speed in comparison to conventional full-frame decoding followed by optical flow estimation. In order to evaluate the accuracy of a video classification framework based on such activity data, we independently train two CNN architectures on MB texture and MV correspondences and then fuse their scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
