SemanticMoments: Training-Free Motion Similarity via Third Moment Features

Saar Huberman; Kfir Goldberg; Or Patashnik; Sagie Benaim; Ron Mokady

arXiv:2602.09146·cs.CV·February 11, 2026

SemanticMoments: Training-Free Motion Similarity via Third Moment Features

Saar Huberman, Kfir Goldberg, Or Patashnik, Sagie Benaim, Ron Mokady

PDF

Open Access 2 Datasets

TL;DR

SemanticMoments introduces a training-free approach that leverages third moment features of pre-trained semantic models to improve motion similarity retrieval in videos, outperforming existing methods on new benchmarks.

Contribution

The paper presents SemanticMoments, a novel training-free method using higher-order moments of semantic features for motion similarity, addressing limitations of appearance-based and traditional motion inputs.

Findings

01

SemanticMoments outperforms RGB, flow, and text-supervised methods on benchmarks.

02

Existing models struggle to disentangle motion from appearance.

03

New benchmarks reveal the bias in current video representations.

Abstract

Retrieving videos based on semantic motion is a fundamental, yet unsolved, problem. Existing video representation approaches overly rely on static appearance and scene context rather than motion dynamics, a bias inherited from their training data and objectives. Conversely, traditional motion-centric inputs like optical flow lack the semantic grounding needed to understand high-level motion. To demonstrate this inherent bias, we introduce the SimMotion benchmarks, combining controlled synthetic data with a new human-annotated real-world dataset. We show that existing models perform poorly on these benchmarks, often failing to disentangle motion from appearance. To address this gap, we propose SemanticMoments, a simple, training-free method that computes temporal statistics (specifically, higher-order moments) over features from pre-trained semantic models. Across our benchmarks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Human Motion and Animation