Chained Predictions Using Convolutional Neural Networks

Georgia Gkioxari; Alexander Toshev; Navdeep Jaitly

arXiv:1605.02346·cs.CV·October 25, 2016·2 cites

Chained Predictions Using Convolutional Neural Networks

Georgia Gkioxari, Alexander Toshev, Navdeep Jaitly

PDF

Open Access

TL;DR

This paper introduces a convolutional neural network-based sequence-to-sequence model for structured output prediction in vision tasks, demonstrating improved performance in human pose estimation by predicting outputs sequentially with dependency on previous predictions.

Contribution

It adapts sequence-to-sequence models with CNNs for spatial localization, exploring weight sharing and demonstrating state-of-the-art results in human pose estimation.

Findings

01

Chained predictions outperform previous methods in human pose estimation.

02

Untied weights are effective for fixed-structure problems.

03

Sequential prediction improves spatial localization accuracy.

Abstract

In this paper, we present an adaptation of the sequence-to-sequence model for structured output prediction in vision tasks. In this model the output variables for a given input are predicted sequentially using neural networks. The prediction for each output variable depends not only on the input but also on the previously predicted output variables. The model is applied to spatial localization tasks and uses convolutional neural networks (CNNs) for processing input images and a multi-scale deconvolutional architecture for making spatial predictions at each time step. We explore the impact of weight sharing with a recurrent connection matrix between consecutive predictions, and compare it to a formulation where these weights are not tied. Untied weights are particularly suited for problems with a fixed sized structure, where different classes of output are predicted in different steps.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization