TL;DR
This paper introduces a personalized ConvNet approach for human video pose estimation that adapts to individual appearance features, significantly improving accuracy over generic models in long videos and benchmarks.
Contribution
It presents a method to generate high-quality annotations from few initial labels, and uses these to fine-tune a pose estimator for personalized, more accurate results.
Findings
Outperforms state-of-the-art methods on standard benchmarks.
Automatically generates high-quality annotations for personalization.
Improves generic ConvNet performance when trained on generated annotations.
Abstract
We propose a personalized ConvNet pose estimator that automatically adapts itself to the uniqueness of a person's appearance to improve pose estimation in long videos. We make the following contributions: (i) we show that given a few high-precision pose annotations, e.g. from a generic ConvNet pose estimator, additional annotations can be generated throughout the video using a combination of image-based matching for temporally distant frames, and dense optical flow for temporally local frames; (ii) we develop an occlusion aware self-evaluation model that is able to automatically select the high-quality and reject the erroneous additional annotations; and (iii) we demonstrate that these high-quality annotations can be used to fine-tune a ConvNet pose estimator and thereby personalize it to lock on to key discriminative features of the person's appearance. The outcome is a substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Personalizing Human Video Pose Estimation· youtube
