Attend to Who You Are: Supervising Self-Attention for Keypoint Detection and Instance-Aware Association
Sen Yang, Zhicheng Wang, Ze Chen, Yanjie Li, Shoukui Zhang, Zhibin, Quan, Shu-Tao Xia, Yiping Bao, Erjin Zhou, Wankou Yang

TL;DR
This paper introduces a Transformer-based approach with supervised self-attention for improved multi-person keypoint detection and instance association, simplifying the process and enhancing accuracy.
Contribution
It proposes supervising self-attention in Transformers with instance masks to directly associate keypoints and obtain instance segmentation without complex post-processing.
Findings
Effective keypoint detection and instance association demonstrated on COCO dataset.
Supervised self-attention improves instance-aware grouping without pre-defined offsets.
Simplifies pixel assignment pipeline for multi-person pose estimation.
Abstract
This paper presents a new method to solve keypoint detection and instance association by using Transformer. For bottom-up multi-person pose estimation models, they need to detect keypoints and learn associative information between keypoints. We argue that these problems can be entirely solved by Transformer. Specifically, the self-attention in Transformer measures dependencies between any pair of locations, which can provide association information for keypoints grouping. However, the naive attention patterns are still not subjectively controlled, so there is no guarantee that the keypoints will always attend to the instances to which they belong. To address it we propose a novel approach of supervising self-attention for multi-person keypoint detection and instance association. By using instance masks to supervise self-attention to be instance-aware, we can assign the detected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Softmax · Residual Connection
