EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
Runjia Li, Moayed Haji-Ali, Ashkan Mirzaei, Chaoyang Wang, Arpit Sahni, Ivan Skorokhodov, Aliaksandr Siarohin, Tomas Jakab, Junlin Han, Sergey Tulyakov, Philip Torr, Willi Menapace

TL;DR
EgoEdit introduces a new dataset, real-time editing model, and benchmark for egocentric videos, addressing domain-specific challenges like rapid motion and hand interactions to enable interactive AR applications.
Contribution
The paper presents EgoEdit, a novel real-time egocentric video editor, along with a curated dataset and benchmark, advancing interactive AR video editing capabilities.
Findings
EgoEdit achieves temporally stable, instruction-faithful editing results.
It outperforms existing methods on egocentric editing benchmarks.
Maintains performance comparable to top baselines on general editing tasks.
Abstract
We study instruction-guided editing of egocentric videos for interactive AR applications. While recent AI video editors perform well on third-person footage, egocentric views present unique challenges - including rapid egomotion and frequent hand-object interactions - that create a significant domain gap. Moreover, existing offline editing pipelines suffer from high latency, limiting real-time interaction. To address these issues, we present a complete ecosystem for egocentric video editing. First, we construct EgoEditData, a carefully designed and manually curated dataset specifically designed for egocentric editing scenarios, featuring rich hand-object interactions, while explicitly preserving hands. Second, we develop EgoEdit, an instruction-following egocentric video editor that supports real-time streaming inference on a single GPU. Finally, we introduce EgoEditBench, an evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Pose and Action Recognition
