Unfolding Videos Dynamics via Taylor Expansion
Siyi Chen, Minkyu Choi, Zesen Zhao, Kuan Han, Qing Qu, and Zhongming, Liu

TL;DR
This paper introduces ViDiDi, a self-supervised learning method that leverages temporal derivatives and Taylor series expansion to improve video representations by emphasizing motion dynamics, enhancing performance on various benchmarks.
Contribution
The paper presents a novel self-supervised strategy using temporal derivatives and Taylor expansion, integrated into existing frameworks, to better capture motion dynamics in videos.
Findings
Improved action recognition accuracy on UCF101 and Kinetics.
Enhanced video retrieval and action detection performance.
Effective learning of motion features without large models or datasets.
Abstract
Taking inspiration from physical motion, we present a new self-supervised dynamics learning strategy for videos: Video Time-Differentiation for Instance Discrimination (ViDiDi). ViDiDi is a simple and data-efficient strategy, readily applicable to existing self-supervised video representation learning frameworks based on instance discrimination. At its core, ViDiDi observes different aspects of a video through various orders of temporal derivatives of its frame sequence. These derivatives, along with the original frames, support the Taylor series expansion of the underlying continuous dynamics at discrete times, where higher-order derivatives emphasize higher-order motion features. ViDiDi learns a single neural network that encodes a video and its temporal derivatives into consistent embeddings following a balanced alternating learning algorithm. By learning consistent representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Nonlinear Dynamics and Pattern Formation · Chaos control and synchronization
MethodsBootstrap Your Own Latent
