Incorporating Scalability in Unsupervised Spatio-Temporal Feature   Learning

Sujoy Paul; Sourya Roy; Amit K. Roy-Chowdhury

arXiv:1808.01727·cs.CV·August 16, 2018

Incorporating Scalability in Unsupervised Spatio-Temporal Feature Learning

Sujoy Paul, Sourya Roy, Amit K. Roy-Chowdhury

PDF

Open Access

TL;DR

This paper introduces a simple yet effective unsupervised framework for learning spatio-temporal features from videos using a Convolutional 3D Siamese network, reducing reliance on labeled data.

Contribution

It presents a novel unsupervised learning approach with a Siamese network for spatio-temporal feature embedding from unlabeled videos.

Findings

01

Effective feature learning across multiple datasets

02

Applicable to various computer vision tasks

03

Reduces need for labeled video data

Abstract

Deep neural networks are efficient learning machines which leverage upon a large amount of manually labeled data for learning discriminative features. However, acquiring substantial amount of supervised data, especially for videos can be a tedious job across various computer vision tasks. This necessitates learning of visual features from videos in an unsupervised setting. In this paper, we propose a computationally simple, yet effective, framework to learn spatio-temporal feature embedding from unlabeled videos. We train a Convolutional 3D Siamese network using positive and negative pairs mined from videos under certain probabilistic assumptions. Experimental results on three datasets demonstrate that our proposed framework is able to learn weights which can be used for same as well as cross dataset and tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques

MethodsSiamese Network