Unsupervised learning of depth and motion
Kishore Konda, Roland Memisevic

TL;DR
This paper introduces a unified model for unsupervised learning of depth and motion from multiple images or video frames, leveraging biologically inspired units to encode pixel correlations, achieving state-of-the-art results.
Contribution
It presents a novel architecture that jointly learns depth and motion cues using a single learning algorithm and biologically inspired units, outperforming traditional hand-engineered features.
Findings
Achieves state-of-the-art 3-D activity analysis performance
Outperforms existing hand-engineered 3-D motion features significantly
Demonstrates effective joint learning of depth and motion cues
Abstract
We present a model for the joint estimation of disparity and motion. The model is based on learning about the interrelations between images from multiple cameras, multiple frames in a video, or the combination of both. We show that learning depth and motion cues, as well as their combinations, from data is possible within a single type of architecture and a single type of learning algorithm, by using biologically inspired "complex cell" like units, which encode correlations between the pixels across image pairs. Our experimental results show that the learning of depth and motion makes it possible to achieve state-of-the-art performance in 3-D activity analysis, and to outperform existing hand-engineered 3-D motion features by a very large margin.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Cell Image Analysis Techniques
