# H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and   Interactions

**Authors:** Bugra Tekin, Federica Bogo, Marc Pollefeys

arXiv: 1904.05349 · 2019-04-11

## TL;DR

This paper introduces a unified neural network framework that jointly estimates 3D hand-object poses and interactions from egocentric RGB video sequences, achieving state-of-the-art results without external detection modules.

## Contribution

The authors propose an end-to-end trainable architecture that simultaneously predicts 3D hand and object poses, models their interactions, and recognizes actions from monocular RGB images.

## Key findings

- Achieves state-of-the-art accuracy on egocentric hand-object interaction benchmarks.
- Operates effectively without relying on depth data or external detection algorithms.
- Successfully models temporal interactions in video sequences.

## Abstract

We present a unified framework for understanding 3D hand and object interactions in raw image sequences from egocentric RGB cameras. Given a single RGB image, our model jointly estimates the 3D hand and object poses, models their interactions, and recognizes the object and action classes with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end on single images. We further merge and propagate information in the temporal domain to infer interactions between hand and object trajectories and recognize actions. The complete model takes as input a sequence of frames and outputs per-frame 3D hand and object pose predictions along with the estimates of object and action categories for the entire sequence. We demonstrate state-of-the-art performance of our algorithm even in comparison to the approaches that work on depth data and ground-truth annotations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.05349/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1904.05349/full.md

## References

74 references — full list in the complete paper: https://tomesphere.com/paper/1904.05349/full.md

---
Source: https://tomesphere.com/paper/1904.05349