Unsupervised Learning of View-invariant Action Representations
Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

TL;DR
This paper introduces an unsupervised framework for learning view-invariant action representations by predicting 3D motion across multiple views, reducing reliance on labeled data and improving action recognition.
Contribution
It proposes a novel unsupervised learning task of cross-view motion prediction combined with view-adversarial training to learn view-invariant features.
Findings
Effective action recognition on multiple datasets
Learned representations capture view-invariant motion dynamics
View-adversarial training enhances feature invariance
Abstract
The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an expensive and time-consuming process. In this work, we propose an unsupervised learning framework, which exploits unlabeled data to learn video representations. Different from previous works in video representation learning, our unsupervised learning task is to predict 3D motion in multiple target views using video representation from a source view. By learning to extrapolate cross-view motions, the representation can capture view-invariant motion dynamics which is discriminative for the action. In addition, we propose a view-adversarial training method to enhance learning of view-invariant features. We demonstrate the effectiveness of the learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization
