Collaborative Learning for 3D Hand-Object Reconstruction and   Compositional Action Recognition from Egocentric RGB Videos Using   Superquadrics

Tze Ho Elden Tse; Runyang Feng; Linfang Zheng; Jiho Park; Yixing Gao,; Jihie Kim; Ales Leonardis; Hyung Jin Chang

arXiv:2501.07100·cs.CV·January 14, 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics

Tze Ho Elden Tse, Runyang Feng, Linfang Zheng, Jiho Park, Yixing Gao,, Jihie Kim, Ales Leonardis, Hyung Jin Chang

PDF

1 Video

TL;DR

This paper introduces a collaborative learning framework utilizing superquadrics for improved 3D hand-object reconstruction and compositional action recognition from egocentric RGB videos, addressing generalization to unseen objects and actions.

Contribution

It proposes using superquadrics as a shape representation and a collaborative learning approach to enhance action recognition and object reconstruction, especially for unseen objects and compositional actions.

Findings

01

Superquadrics improve object shape representation over bounding boxes.

02

The framework achieves significant accuracy gains in compositional action recognition.

03

Extensive evaluations show state-of-the-art performance on extended datasets.

Abstract

With the availability of egocentric 3D hand-object interaction datasets, there is increasing interest in developing unified models for hand-object pose estimation and action recognition. However, existing methods still struggle to recognise seen actions on unseen objects due to the limitations in representing object shape and movement using 3D bounding boxes. Additionally, the reliance on object templates at test time limits their generalisability to unseen objects. To address these challenges, we propose to leverage superquadrics as an alternative 3D object representation to bounding boxes and demonstrate their effectiveness on both template-free object reconstruction and action recognition tasks. Moreover, as we find that pure appearance-based methods can outperform the unified methods, the potential benefits from 3D geometric information remain unclear. Therefore, we study the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics· underline