Mimic The Raw Domain: Accelerating Action Recognition in the Compressed Domain
Barak Battash, Haim Barad, Hanlin Tang, Amit Bleiweiss

TL;DR
This paper introduces a novel method for action recognition directly in the compressed video domain, achieving near state-of-the-art accuracy with significantly fewer parameters and computational resources by leveraging residual frames and a teacher-student training approach.
Contribution
It proposes a new approach that treats compressed video data as a single unit and uses residuals to replace raw RGB frames, enabling efficient recognition without raw video processing.
Findings
Achieves near state-of-the-art accuracy on HMDB51, UCF1, and Kinetics datasets.
Model MFCD-Net has 11X fewer parameters and 3X fewer Flops than previous methods.
Enables efficient video recognition solely in the compressed domain.
Abstract
Video understanding usually requires expensive computation that prohibits its deployment, yet videos contain significant spatiotemporal redundancy that can be exploited. In particular, operating directly on the motion vectors and residuals in the compressed video domain can significantly accelerate the compute, by not using the raw videos which demand colossal storage capacity. Existing methods approach this task as a multiple modalities problem. In this paper we are approaching the task in a completely different way; we are looking at the data from the compressed stream as a one unit clip and propose that the residual frames can replace the original RGB frames from the raw domain. Furthermore, we are using teacher-student method to aid the network in the compressed domain to mimic the teacher network in the raw domain. We show experiments on three leading datasets (HMDB51, UCF1, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
