EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

Ruiping Liu; Junwei Zheng; Yufan Chen; Di Wen; Shaofang Quan; Chengzhi Wu; Jiaming Zhang; Kailun Yang; Kunyu Peng; Rainer Stiefelhagen

arXiv:2605.18734·cs.CV·May 19, 2026

EgoExoMem: Cross-View Memory Reasoning over Synchronized Egocentric and Exocentric Videos

Ruiping Liu, Junwei Zheng, Yufan Chen, Di Wen, Shaofang Quan, Chengzhi Wu, Jiaming Zhang, Kailun Yang, Kunyu Peng, Rainer Stiefelhagen

PDF

TL;DR

EgoExoMem introduces a new benchmark for cross-view memory reasoning using synchronized egocentric and exocentric videos, highlighting the challenges and potential of dual-view cues in embodied intelligence.

Contribution

It presents the first benchmark for cross-view memory reasoning and proposes E^2-Select, a novel frame selection method for synchronized videos.

Findings

01

Existing models perform poorly on the benchmark, with the best at 55.3%.

02

E^2-Select outperforms other frame-selection and memory baselines, achieving 58.2%.

03

Experiments show complementary cues from ego and exo views, with view-preference conflicts.

Abstract

Egocentric memory is widely used in embodied intelligence, but it may be insufficient for comprehensive spatial-temporal reasoning. Inspired by human recall from both field and observer perspectives, we introduce EgoExoMem, the first benchmark for cross-view memory reasoning over synchronized egocentric and exocentric videos. EgoExoMem contains $2.6 K$ high-quality MCQs across eight temporal, spatial, and cross-view QA types. To support dual-view retrieval, we propose E $^{2}$ -Select, a training-free frame selection method for synchronized ego-exo videos. It combines relevance-based budget allocation with per-view k-DPP sampling to handle view asymmetry and cross-view temporal consistency. Experiments show that ego and exo views provide complementary memory cues, while existing MLLMs remain far from solving the benchmark: the best model reaches only $55.3%$ . E $^{2}$ -Select achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.