Smooth regularization for efficient video recognition

Gil Goldman; Raja Giryes; Mahadev Satyanarayanan

arXiv:2511.20928·cs.CV·January 13, 2026

Smooth regularization for efficient video recognition

Gil Goldman, Raja Giryes, Mahadev Satyanarayanan

PDF

Open Access 1 Models 1 Datasets 1 Video

TL;DR

This paper introduces a smooth regularization method for video recognition that enhances lightweight models by modeling temporal coherence as a Gaussian Random Walk, leading to significant accuracy improvements.

Contribution

The paper presents a novel smooth regularization technique based on Gaussian Random Walks that improves temporal modeling in lightweight video recognition architectures.

Findings

01

Achieves 3.8% to 6.4% accuracy improvements on Kinetics-600.

02

State-of-the-art results for MoViNets with 3.8% to 6.1% gains.

03

MobileNetV3 and MoViNets-Stream see 4.9% to 6.4% accuracy boosts.

Abstract

We propose a smooth regularization technique that instills a strong temporal inductive bias in video recognition models, particularly benefiting lightweight architectures. Our method encourages smoothness in the intermediate-layer embeddings of consecutive frames by modeling their changes as a Gaussian Random Walk (GRW). This penalizes abrupt representational shifts, thereby promoting low-acceleration solutions that better align with the natural temporal coherence inherent in videos. By leveraging this enforced smoothness, lightweight models can more effectively capture complex temporal dynamics. Applied to such models, our technique yields a 3.8% to 6.4% accuracy improvement on Kinetics-600. Notably, the MoViNets model family trained with our smooth regularization improves the current state of the art by 3.8% to 6.1% within their respective FLOP constraints, while MobileNetV3 and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
DrGil/grw-smoothing-movinet
model

Datasets

DrGil/k600_test_ds
dataset· 13 dl
13 dl

Videos

Smooth Regularization for Efficient Video Recognition· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications