Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach
Dongyang Yu, Yunshi Xie, Wangpeng An, Li Zhang, Yufeng Yao

TL;DR
This paper presents JCRA, a fast, accurate, and simple one-stage neural network for multi-person 2D pose estimation that directly predicts keypoints and associations without post-processing.
Contribution
The authors introduce a novel end-to-end network architecture with a symmetric transformer-based design that improves speed and accuracy in multi-person pose estimation.
Findings
JCRA achieves 69.2 mAP on MS COCO benchmark.
JCRA is 78% faster at inference than previous methods.
Outperforms state-of-the-art in accuracy and efficiency.
Abstract
We introduce a novel one-stage end-to-end multi-person 2D pose estimation algorithm, known as Joint Coordinate Regression and Association (JCRA), that produces human pose joints and associations without requiring any post-processing. The proposed algorithm is fast, accurate, effective, and simple. The one-stage end-to-end network architecture significantly improves the inference speed of JCRA. Meanwhile, we devised a symmetric network structure for both the encoder and decoder, which ensures high accuracy in identifying keypoints. It follows an architecture that directly outputs part positions via a transformer network, resulting in a significant improvement in performance. Extensive experiments on the MS COCO and CrowdPose benchmarks demonstrate that JCRA outperforms state-of-the-art approaches in both accuracy and efficiency. Moreover, JCRA demonstrates 69.2 mAP and is 78\% faster at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Anomaly Detection Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
