EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

Runjia Li; Moayed Haji-Ali; Ashkan Mirzaei; Chaoyang Wang; Arpit Sahni; Ivan Skorokhodov; Aliaksandr Siarohin; Tomas Jakab; Junlin Han; Sergey Tulyakov; Philip Torr; Willi Menapace

arXiv:2512.06065·cs.CV·December 9, 2025

EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

Runjia Li, Moayed Haji-Ali, Ashkan Mirzaei, Chaoyang Wang, Arpit Sahni, Ivan Skorokhodov, Aliaksandr Siarohin, Tomas Jakab, Junlin Han, Sergey Tulyakov, Philip Torr, Willi Menapace

PDF

Open Access 1 Datasets

TL;DR

EgoEdit introduces a new dataset, real-time editing model, and benchmark for egocentric videos, addressing domain-specific challenges like rapid motion and hand interactions to enable interactive AR applications.

Contribution

The paper presents EgoEdit, a novel real-time egocentric video editor, along with a curated dataset and benchmark, advancing interactive AR video editing capabilities.

Findings

01

EgoEdit achieves temporally stable, instruction-faithful editing results.

02

It outperforms existing methods on egocentric editing benchmarks.

03

Maintains performance comparable to top baselines on general editing tasks.

Abstract

We study instruction-guided editing of egocentric videos for interactive AR applications. While recent AI video editors perform well on third-person footage, egocentric views present unique challenges - including rapid egomotion and frequent hand-object interactions - that create a significant domain gap. Moreover, existing offline editing pipelines suffer from high latency, limiting real-time interaction. To address these issues, we present a complete ecosystem for egocentric video editing. First, we construct EgoEditData, a carefully designed and manually curated dataset specifically designed for egocentric editing scenarios, featuring rich hand-object interactions, while explicitly preserving hands. Second, we develop EgoEdit, an instruction-following egocentric video editor that supports real-time streaming inference on a single GPU. Finally, we introduce EgoEditBench, an evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

liguang0115/EgoEdit
dataset· 119 dl
119 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Pose and Action Recognition