HomE: Homography-Equivariant Video Representation Learning
Anirudh Sriram, Adrien Gaidon, Jiajun Wu, Juan Carlos Niebles, Li, Fei-Fei, Ehsan Adeli

TL;DR
HomE introduces a self-supervised video representation learning method that explicitly models homography equivariance, improving performance on action recognition and pedestrian intent prediction tasks.
Contribution
The paper proposes a novel homography-equivariant representation learning approach for multi-view videos, leveraging geometric relationships for better self-supervised learning.
Findings
Achieves 96.4% accuracy on UCF101 action classification.
Outperforms state-of-the-art by 6% on pedestrian intent prediction.
Obtains 91.2% accuracy for pedestrian action classification.
Abstract
Recent advances in self-supervised representation learning have enabled more efficient and robust model performance without relying on extensive labeled data. However, most works are still focused on images, with few working on videos and even fewer on multi-view videos, where more powerful inductive biases can be leveraged for self-supervision. In this work, we propose a novel method for representation learning of multi-view videos, where we explicitly model the representation space to maintain Homography Equivariance (HomE). Our method learns an implicit mapping between different views, culminating in a representation space that maintains the homography relationship between neighboring views. We evaluate our HomE representation via action recognition and pedestrian intent prediction as downstream tasks. On action classification, our method obtains 96.4% 3-fold accuracy on the UCF101…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
