SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity
Kunyun Wang, Shuo Yang, Jieru Zhao, Wenchao Ding, Quan Chen, Jingwen Leng, Minyi Guo

TL;DR
SparseTem leverages temporal continuity in videos to significantly accelerate CNN-based encoders like EfficientDet and CRNN with minimal accuracy loss, using memory-efficient scheduling and online adjustment techniques.
Contribution
The paper introduces SparseTem, a novel framework that exploits temporal continuity to boost CNN-based video encoder efficiency while addressing memory and accuracy challenges.
Findings
Achieves 1.79x speedup for EfficientDet
Achieves 4.72x speedup for CRNN
Maintains minimal accuracy degradation
Abstract
Deep learning models have become pivotal in the field of video processing and is increasingly critical in practical applications such as autonomous driving and object detection. Although Vision Transformers (ViTs) have demonstrated their power, Convolutional Neural Networks (CNNs) remain a highly efficient and high-performance choice for feature extraction and encoding. However, the intensive computational demands of convolution operations hinder its broader adoption as a video encoder. Given the inherent temporal continuity in video frames, changes between consecutive frames are minimal, allowing for the skipping of redundant computations. This technique, which we term as Diff Computation, presents two primary challenges. First, Diff Computation requires to cache intermediate feature maps to ensure the correctness of non-linear computations, leading to significant memory consumption.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Advanced Data Compression Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Depthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Batch Normalization · Convolution · BiFPN · EfficientDet
