Unsupervised Keypoint Learning for Guiding Class-Conditional Video   Prediction

Yunji Kim; Seonghyeon Nam; In Cho; Seon Joo Kim

arXiv:1910.02027·cs.CV·October 7, 2019·30 cites

Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction

Yunji Kim, Seonghyeon Nam, In Cho, Seon Joo Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised deep learning approach for keypoint detection and video prediction, enabling realistic future frame generation from a single image and action class without manual keypoint labeling.

Contribution

It presents a novel unsupervised method for detecting keypoints and predicting future video frames conditioned on an image and action class.

Findings

01

Keypoints detected are similar to human annotations.

02

Predicted videos are more realistic than previous methods.

03

Method works across various datasets without manual keypoint labels.

Abstract

We propose a deep video prediction model conditioned on a single image and an action class. To generate future frames, we first detect keypoints of a moving object and predict future motion as a sequence of keypoints. The input image is then translated following the predicted keypoints sequence to compose future frames. Detecting the keypoints is central to our algorithm, and our method is trained to detect the keypoints of arbitrary objects in an unsupervised manner. Moreover, the detected keypoints of the original videos are used as pseudo-labels to learn the motion of objects. Experimental results show that our method is successfully applied to various datasets without the cost of labeling keypoints in videos. The detected keypoints are similar to human-annotated labels, and prediction results are more realistic compared to the previous methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YunjiKim/Unsupervised-Keypoint-Learning-for-Guiding-Class-conditional-Video-Prediction
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications