THOR-Net: End-to-end Graformer-based Realistic Two Hands and Object   Reconstruction with Self-supervision

Ahmed Tawfik Aboukhadra; Jameel Malik; Ahmed Elhayek; Nadia Robertini; and Didier Stricker

arXiv:2210.13853·cs.CV·October 26, 2022·1 cites

THOR-Net: End-to-end Graformer-based Realistic Two Hands and Object Reconstruction with Self-supervision

Ahmed Tawfik Aboukhadra, Jameel Malik, Ahmed Elhayek, Nadia Robertini, and Didier Stricker

PDF

Open Access 1 Repo

TL;DR

THOR-Net is an innovative end-to-end framework that combines GCNs, Transformers, and self-supervision to accurately reconstruct two interacting hands and objects from a single RGB image, advancing virtual reality applications.

Contribution

It introduces a novel GraFormer-based architecture with a two-stage process and self-supervised photometric loss for realistic 3D hand and object reconstruction from monocular images.

Findings

01

Achieves state-of-the-art hand shape estimation on HO-3D dataset.

02

Surpasses existing methods in hand pose accuracy on H2O dataset.

03

Demonstrates effective reconstruction of two hands and objects from a single RGB image.

Abstract

Realistic reconstruction of two hands interacting with objects is a new and challenging problem that is essential for building personalized Virtual and Augmented Reality environments. Graph Convolutional networks (GCNs) allow for the preservation of the topologies of hands poses and shapes by modeling them as a graph. In this work, we propose the THOR-Net which combines the power of GCNs, Transformer, and self-supervision to realistically reconstruct two hands and an object from a single RGB image. Our network comprises two stages; namely the features extraction stage and the reconstruction stage. In the features extraction stage, a Keypoint RCNN is used to extract 2D poses, features maps, heatmaps, and bounding boxes from a monocular RGB image. Thereafter, this 2D information is modeled as two graphs and passed to the two branches of the reconstruction stage. The shape reconstruction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ataboukhadra/thor-net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · 3D Shape Modeling and Analysis · Hand Gesture Recognition Systems

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization