Detect-and-Track: Efficient Pose Estimation in Videos

Rohit Girdhar; Georgia Gkioxari; Lorenzo Torresani; Manohar Paluri and; Du Tran

arXiv:1712.09184·cs.CV·May 4, 2018·27 cites

Detect-and-Track: Efficient Pose Estimation in Videos

Rohit Girdhar, Georgia Gkioxari, Lorenzo Torresani, Manohar Paluri and, Du Tran

PDF

Open Access 1 Repo

TL;DR

This paper introduces a lightweight, two-stage method for human pose estimation and tracking in videos, combining frame-based keypoint detection with temporal tracking to improve accuracy and efficiency.

Contribution

The paper presents a novel two-stage approach that integrates frame-level pose estimation with lightweight tracking, utilizing a 3D extension of Mask R-CNN for enhanced robustness.

Findings

01

Achieves 55.2% MOTA on PoseTrack validation set

02

State-of-the-art performance on ICCV 2017 PoseTrack challenge

03

Effective use of temporal information for improved pose estimation

Abstract

This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video. We propose an extremely lightweight yet highly effective approach that builds upon the latest advancements in human detection and video understanding. Our method operates in two-stages: keypoint estimation in frames or short clips, followed by lightweight tracking to generate keypoint predictions linked over the entire video. For frame-level pose estimation we experiment with Mask R-CNN, as well as our own proposed 3D extension of this model, which leverages temporal information over small clips to generate more robust frame predictions. We conduct extensive ablative experiments on the newly released multi-person video pose estimation benchmark, PoseTrack, to validate various design choices of our model. Our approach achieves an accuracy of 55.2% on the validation and 51.8%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/DetectAndTrack
caffe2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging

MethodsRegion Proposal Network · Softmax · RoIAlign · Convolution · Mask R-CNN