Object Parsing in Sequences Using CoordConv Gated Recurrent Networks

Ayush Gaud; Y V S Harish; K Madhava Krishna

arXiv:1910.00895·cs.CV·October 3, 2019

Object Parsing in Sequences Using CoordConv Gated Recurrent Networks

Ayush Gaud, Y V S Harish, K Madhava Krishna

PDF

Open Access

TL;DR

This paper introduces a recurrent neural network architecture with CoordConvGRU units for consistent keypoint localization in video sequences, effectively modeling motion dynamics and achieving real-time performance.

Contribution

It proposes a novel CoordConvGRU memory cell and a recurrent hourglass architecture, enhancing sequential keypoint localization with improved accuracy and efficiency.

Findings

01

Outperforms baseline hourglass network in keypoint localization

02

Achieves real-time processing on standard GPU

03

Requires minimal fine-tuning on real data

Abstract

We present a monocular object parsing framework for consistent keypoint localization by capturing temporal correlation on sequential data. In this paper, we propose a novel recurrent network based architecture to model long-range dependencies between intermediate features which are highly useful in tasks like keypoint localization and tracking. We leverage the expressiveness of the popular stacked hourglass architecture and augment it by adopting memory units between intermediate layers of the network with weights shared across stages for video frames. We observe that this weight sharing scheme not only enables us to frame hourglass architecture as a recurrent network but also prove to be highly effective in producing increasingly refined estimates for sequential tasks. Furthermore, we propose a new memory cell, we call CoordConvGRU which learns to selectively preserve spatio-temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Video Surveillance and Tracking Methods