SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity

Kunyun Wang; Shuo Yang; Jieru Zhao; Wenchao Ding; Quan Chen; Jingwen Leng; Minyi Guo

arXiv:2410.20790·cs.CV·August 12, 2025

SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity

Kunyun Wang, Shuo Yang, Jieru Zhao, Wenchao Ding, Quan Chen, Jingwen Leng, Minyi Guo

PDF

Open Access

TL;DR

SparseTem leverages temporal continuity in videos to significantly accelerate CNN-based encoders like EfficientDet and CRNN with minimal accuracy loss, using memory-efficient scheduling and online adjustment techniques.

Contribution

The paper introduces SparseTem, a novel framework that exploits temporal continuity to boost CNN-based video encoder efficiency while addressing memory and accuracy challenges.

Findings

01

Achieves 1.79x speedup for EfficientDet

02

Achieves 4.72x speedup for CRNN

03

Maintains minimal accuracy degradation

Abstract

Deep learning models have become pivotal in the field of video processing and is increasingly critical in practical applications such as autonomous driving and object detection. Although Vision Transformers (ViTs) have demonstrated their power, Convolutional Neural Networks (CNNs) remain a highly efficient and high-performance choice for feature extraction and encoding. However, the intensive computational demands of convolution operations hinder its broader adoption as a video encoder. Given the inherent temporal continuity in video frames, changes between consecutive frames are minimal, allowing for the skipping of redundant computations. This technique, which we term as Diff Computation, presents two primary challenges. First, Diff Computation requires to cache intermediate feature maps to ensure the correctness of non-linear computations, leading to significant memory consumption.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Advanced Data Compression Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Depthwise Convolution · Pointwise Convolution · Depthwise Separable Convolution · Batch Normalization · Convolution · BiFPN · EfficientDet