HomE: Homography-Equivariant Video Representation Learning

Anirudh Sriram; Adrien Gaidon; Jiajun Wu; Juan Carlos Niebles; Li; Fei-Fei; Ehsan Adeli

arXiv:2306.01623·cs.CV·June 5, 2023·1 cites

HomE: Homography-Equivariant Video Representation Learning

Anirudh Sriram, Adrien Gaidon, Jiajun Wu, Juan Carlos Niebles, Li, Fei-Fei, Ehsan Adeli

PDF

Open Access 1 Repo

TL;DR

HomE introduces a self-supervised video representation learning method that explicitly models homography equivariance, improving performance on action recognition and pedestrian intent prediction tasks.

Contribution

The paper proposes a novel homography-equivariant representation learning approach for multi-view videos, leveraging geometric relationships for better self-supervised learning.

Findings

01

Achieves 96.4% accuracy on UCF101 action classification.

02

Outperforms state-of-the-art by 6% on pedestrian intent prediction.

03

Obtains 91.2% accuracy for pedestrian action classification.

Abstract

Recent advances in self-supervised representation learning have enabled more efficient and robust model performance without relying on extensive labeled data. However, most works are still focused on images, with few working on videos and even fewer on multi-view videos, where more powerful inductive biases can be leveraged for self-supervision. In this work, we propose a novel method for representation learning of multi-view videos, where we explicitly model the representation space to maintain Homography Equivariance (HomE). Our method learns an implicit mapping between different views, culminating in a representation space that maintains the homography relationship between neighboring views. We evaluate our HomE representation via action recognition and pedestrian intent prediction as downstream tasks. On action classification, our method obtains 96.4% 3-fold accuracy on the UCF101…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anirudhs123/home
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning