EgoInteract: Synthetic Egocentric Videos Generation for Interaction Understanding and Anticipation
Rosario Leonardi, Francesco Ragusa, Daniele Materia, Alessandro Passanisi, James Fort, Jakob Engel, Giovanni Maria Farinella

TL;DR
EgoInteract introduces a controllable egocentric video simulator that generates annotated synthetic data, improving model performance on real-world perception tasks involving human-object interactions.
Contribution
The paper presents a novel egocentric video simulator and a synthetic dataset, enabling better training for interaction understanding and anticipation tasks.
Findings
Models trained on synthetic data outperform baselines on real benchmarks.
The simulator provides precise control over interactions and scene composition.
Synthetic data improves performance across diverse environments and tasks.
Abstract
Collecting large-scale egocentric video datasets with dense spatial and temporal annotations is costly, slow, and often constrained by environmental biases, privacy constraints, and limited coverage of interaction patterns. While synthetic data has shown strong potential in several vision domains, its use for egocentric perception remains relatively underexplored, especially for tasks requiring temporally coherent human-object interactions. In this work, we introduce EgoInteract, a controllable simulator for egocentric video generation designed to model fine-grained egocentric interactions and their temporal dynamics. The simulator enables precise control over camera, human body and hand motion, object manipulation, and scene composition across diverse environments. Building on this framework, we generate a synthetic egocentric video dataset with dense spatial and temporal annotations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
