Faster and Accurate Compressed Video Action Recognition Straight from the Frequency Domain
Samuel Felipe dos Santos, Jurandy Almeida

TL;DR
This paper introduces a deep neural network that recognizes human actions directly from compressed video data, significantly reducing decoding overhead and doubling inference speed while maintaining competitive accuracy.
Contribution
The novel approach enables action recognition directly in the frequency domain from compressed videos, bypassing the decoding step for faster processing.
Findings
Achieved comparable accuracy to state-of-the-art methods on UCF-101 and HMDB-51 datasets.
Ran up to 2 times faster in inference speed compared to traditional methods.
Demonstrated effectiveness of frequency domain processing for video action recognition.
Abstract
Human action recognition has become one of the most active field of research in computer vision due to its wide range of applications, like surveillance, medical, industrial environments, smart homes, among others. Recently, deep learning has been successfully used to learn powerful and interpretable features for recognizing human actions in videos. Most of the existing deep learning approaches have been designed for processing video information as RGB image sequences. For this reason, a preliminary decoding process is required, since video data are often stored in a compressed format. However, a high computational load and memory usage is demanded for decoding a video. To overcome this problem, we propose a deep neural network capable of learning straight from compressed video. Our approach was evaluated on two public benchmarks, the UCF-101 and HMDB-51 datasets, demonstrating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
