MOOSE: Pay Attention to Temporal Dynamics for Video Understanding via Optical Flows

Hong Nguyen; Dung Tran; Hieu Hoang; Phong Nguyen; Shrikanth Narayanan

arXiv:2506.01119·cs.CV·June 3, 2025

MOOSE: Pay Attention to Temporal Dynamics for Video Understanding via Optical Flows

Hong Nguyen, Dung Tran, Hieu Hoang, Phong Nguyen, Shrikanth Narayanan

PDF

Open Access

TL;DR

MOOSE is a novel video encoder that efficiently models temporal dynamics by integrating optical flow with spatial embeddings, improving interpretability and achieving state-of-the-art results across various video understanding tasks.

Contribution

Introducing MOOSE, a temporally-centric architecture that leverages pre-trained optical flow and visual encoders for efficient and interpretable video analysis.

Findings

01

State-of-the-art performance on multiple benchmarks

02

Reduced computational complexity compared to prior models

03

Enhanced interpretability of temporal dynamics

Abstract

Many motion-centric video analysis tasks, such as atomic actions, detecting atypical motor behavior in individuals with autism, or analyzing articulatory motion in real-time MRI of human speech, require efficient and interpretable temporal modeling. Capturing temporal dynamics is a central challenge in video analysis, often requiring significant computational resources and fine-grained annotations that are not widely available. This paper presents MOOSE (Motion Flow Over Spatial Space), a novel temporally-centric video encoder explicitly integrating optical flow with spatial embeddings to model temporal information efficiently, inspired by human perception of motion. Unlike prior models, MOOSE takes advantage of rich, widely available pre-trained visual and optical flow encoders instead of training video models from scratch. This significantly reduces computational complexity while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Advanced Vision and Imaging