SCAPE: A Simple and Strong Category-Agnostic Pose Estimator

Yujia Liang; Zixuan Ye; Wenze Liu; Hao Lu

arXiv:2407.13483·cs.CV·July 19, 2024

SCAPE: A Simple and Strong Category-Agnostic Pose Estimator

Yujia Liang, Zixuan Ye, Wenze Liu, Hao Lu

PDF

Open Access 1 Repo

TL;DR

SCAPE introduces a simplified, attention-based approach for category-agnostic pose estimation, achieving superior accuracy and efficiency over prior methods by focusing on feature matching within a streamlined architecture.

Contribution

The paper proposes a simple, strong baseline for CAPE using pure self-attention and introduces two modules to enhance attention quality, outperforming prior arts in accuracy and speed.

Findings

01

Outperforms prior methods by 2.2 and 1.3 PCK in 1-shot and 5-shot settings

02

Faster inference speed and lighter model capacity

03

Effective attention process with global keypoint features and keypoint attention refiner

Abstract

Category-Agnostic Pose Estimation (CAPE) aims to localize keypoints on an object of any category given few exemplars in an in-context manner. Prior arts involve sophisticated designs, e.g., sundry modules for similarity calculation and a two-stage framework, or takes in extra heatmap generation and supervision. We notice that CAPE is essentially a task about feature matching, which can be solved within the attention process. Therefore we first streamline the architecture into a simple baseline consisting of several pure self-attention layers and an MLP regression head -- this simplification means that one only needs to consider the attention quality to boost the performance of CAPE. Towards an effective attention process for CAPE, we further introduce two key modules: i) a global keypoint feature perceptor to inject global semantic information into support keypoints, and ii) a keypoint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tiny-smart/SCAPE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · EEG and Brain-Computer Interfaces · Mechanics and Biomechanics Studies

MethodsSoftmax · Attention Is All You Need · Heatmap · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings