TL;DR
This paper demonstrates that synthetic data can significantly enhance egocentric hand-object interaction detection, especially with limited real labeled data, by systematically aligning synthetic and real data and releasing a new benchmark and data generation pipeline.
Contribution
The authors introduce a synthetic data generation pipeline and the HOI-Synth benchmark, improving HOI detection and providing tools for synthetic data creation and evaluation.
Findings
Synthetic data improves HOI detection by up to 11.69% AP on ENIGMA-51.
Aligning synthetic data with real benchmarks enhances detection performance.
Using only 10% of real data with synthetic augmentation yields significant gains.
Abstract
In this work, we explore the role of synthetic data in improving the detection of Hand-Object Interactions from egocentric images. Through extensive experimentation and comparative analysis on VISOR, EgoHOS, and ENIGMA-51 datasets, our findings demonstrate the potential of synthetic data to significantly improve HOI detection, particularly when real labeled data are scarce or unavailable. By using synthetic data and only 10% of the real labeled data, we achieve improvements in Overall AP over models trained exclusively on real data, with gains of +5.67% on VISOR, +8.24% on EgoHOS, and +11.69% on ENIGMA-51. Furthermore, we systematically study how aligning synthetic data to specific real-world benchmarks with respect to objects, grasps, and environments, showing that the effectiveness of synthetic data consistently improves with better synthetic-real alignment. As a result of this work,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
