K-Order Graph-oriented Transformer with GraAttention for 3D Pose and Shape Estimation
Weixi Zhao, Weiqiang Wang

TL;DR
This paper introduces KOG-Transformer, a novel graph-oriented attention network for 3D pose estimation, and GASE-Net for hand shape modeling, both demonstrating superior performance on benchmark datasets.
Contribution
The paper presents new graph-specific attention modules and a combined network architecture for improved 3D pose and hand shape estimation.
Findings
KOG-Transformer outperforms previous methods on Human3.6M.
GASE-Net accurately predicts hand shapes with strong generalization.
Proposed modules effectively model relationships in graph-structured data.
Abstract
We propose a novel attention-based 2D-to-3D pose estimation network for graph-structured data, named KOG-Transformer, and a 3D pose-to-shape estimation network for hand data, named GASE-Net. Previous 3D pose estimation methods have focused on various modifications to the graph convolution kernel, such as abandoning weight sharing or increasing the receptive field. Some of these methods employ attention-based non-local modules as auxiliary modules. In order to better model the relationship between nodes in graph-structured data and fuse the information of different neighbor nodes in a differentiated way, we make targeted modifications to the attention module and propose two modules designed for graph-structured data, graph relative positional encoding multi-head self-attention (GR-MSA) and K-order graph-oriented multi-head self-attention (KOG-MSA). By stacking GR-MSA and KOG-MSA, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Robot Manipulation and Learning
MethodsConvolution
