GlovEgo-HOI: Bridging the Synthetic-to-Real Gap for Industrial Egocentric Human-Object Interaction Detection

Alfio Spoto; Rosario Leonardi; Francesco Ragusa; and Giovanni Maria Farinella

arXiv:2601.09528·cs.CV·January 15, 2026

GlovEgo-HOI: Bridging the Synthetic-to-Real Gap for Industrial Egocentric Human-Object Interaction Detection

Alfio Spoto, Rosario Leonardi, Francesco Ragusa, and Giovanni Maria Farinella

PDF

Open Access

TL;DR

This paper introduces GlovEgo-HOI, a new dataset and model for industrial egocentric human-object interaction detection, utilizing synthetic data augmentation and hand pose information to improve robustness in safety-critical environments.

Contribution

The paper presents a novel synthetic data augmentation framework, a new industrial EHOI dataset, and a model that leverages hand pose cues for better interaction detection.

Findings

01

Synthetic data augmentation improves model robustness.

02

GlovEgo-Net outperforms baseline methods.

03

Public release of dataset and models facilitates future research.

Abstract

Egocentric Human-Object Interaction (EHOI) analysis is crucial for industrial safety, yet the development of robust models is hindered by the scarcity of annotated domain-specific data. We address this challenge by introducing a data generation framework that combines synthetic data with a diffusion-based process to augment real-world images with realistic Personal Protective Equipment (PPE). We present GlovEgo-HOI, a new benchmark dataset for industrial EHOI, and GlovEgo-Net, a model integrating Glove-Head and Keypoint- Head modules to leverage hand pose information for enhanced interaction detection. Extensive experiments demonstrate the effectiveness of the proposed data generation framework and GlovEgo-Net. To foster further research, we release the GlovEgo-HOI dataset, augmentation pipeline, and pre-trained models at: GitHub project.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Robot Manipulation and Learning