Temporal-Needle: A view and appearance invariant video descriptor
Michal Yarom, Michal Irani

TL;DR
The paper introduces the 'temporal-needle' video descriptor that captures dynamic behavior invariant to viewpoint and appearance, enabling improved action detection, alignment, and clustering across diverse videos.
Contribution
It proposes a novel multi-scale, self-similarity based descriptor that is invariant to viewpoint and appearance, and introduces a method for efficient matching using statistically significant descriptors.
Findings
Effective in detecting similar actions across videos with different viewpoints and backgrounds.
Useful for temporal/spatial alignment and unsupervised video clustering.
Demonstrated on stationary camera videos, with potential extension to moving cameras.
Abstract
The ability to detect similar actions across videos can be very useful for real-world applications in many fields. However, this task is still challenging for existing systems, since videos that present the same action, can be taken from significantly different viewing directions, performed by different actors and backgrounds and under various video qualities. Video descriptors play a significant role in these systems. In this work we propose the "temporal-needle" descriptor which captures the dynamic behavior, while being invariant to viewpoint and appearance. The descriptor is computed using multi temporal scales of the video and by computing self-similarity for every patch through time in every temporal scale. The descriptor is computed for every pixel in the video. However, to find similar actions across videos, we consider only a small subset of the descriptors - the statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
