Joint Coordinate Regression and Association For Multi-Person Pose   Estimation, A Pure Neural Network Approach

Dongyang Yu; Yunshi Xie; Wangpeng An; Li Zhang; Yufeng Yao

arXiv:2307.01004·cs.CV·April 22, 2024

Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach

Dongyang Yu, Yunshi Xie, Wangpeng An, Li Zhang, Yufeng Yao

PDF

Open Access

TL;DR

This paper presents JCRA, a fast, accurate, and simple one-stage neural network for multi-person 2D pose estimation that directly predicts keypoints and associations without post-processing.

Contribution

The authors introduce a novel end-to-end network architecture with a symmetric transformer-based design that improves speed and accuracy in multi-person pose estimation.

Findings

01

JCRA achieves 69.2 mAP on MS COCO benchmark.

02

JCRA is 78% faster at inference than previous methods.

03

Outperforms state-of-the-art in accuracy and efficiency.

Abstract

We introduce a novel one-stage end-to-end multi-person 2D pose estimation algorithm, known as Joint Coordinate Regression and Association (JCRA), that produces human pose joints and associations without requiring any post-processing. The proposed algorithm is fast, accurate, effective, and simple. The one-stage end-to-end network architecture significantly improves the inference speed of JCRA. Meanwhile, we devised a symmetric network structure for both the encoder and decoder, which ensures high accuracy in identifying keypoints. It follows an architecture that directly outputs part positions via a transformer network, resulting in a significant improvement in performance. Extensive experiments on the MS COCO and CrowdPose benchmarks demonstrate that JCRA outperforms state-of-the-art approaches in both accuracy and efficiency. Moreover, JCRA demonstrates 69.2 mAP and is 78\% faster at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Anomaly Detection Techniques and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings