OSGNet with MLLM Reranking @ Ego4D Episodic Memory Challenge 2026

Yisen Feng; Leigang Qu; Haoyu Zhang; Qiaohui Chu; Meng Liu; Xuemeng Song; Weili Guan; Liqiang Nie

arXiv:2605.20818·cs.CV·May 21, 2026

OSGNet with MLLM Reranking @ Ego4D Episodic Memory Challenge 2026

Yisen Feng, Leigang Qu, Haoyu Zhang, Qiaohui Chu, Meng Liu, Xuemeng Song, Weili Guan, Liqiang Nie

PDF

1 Repo

TL;DR

This paper introduces a reranking framework combining OSGNet and multimodal large language models to improve temporal localization in egocentric videos, achieving top results in the Ego4D challenge.

Contribution

The novel integration of OSGNet with MLLM reranking significantly enhances localization accuracy in egocentric video tasks.

Findings

01

Achieved first place in both challenge tracks.

02

Effective combination of existing localization and reasoning models.

03

Improved candidate selection accuracy.

Abstract

In this report, we present our champion solutions for the Natural Language Queries and GoalStep tracks of the Ego4D Episodic Memory Challenge at CVPR 2026. Both tracks require accurately localizing temporal segments from long untrimmed egocentric videos. To address these tasks, we propose a reranking-based framework that effectively leverages the strong video-language reasoning capability of multimodal large language model (MLLM) while preserving the efficiency and candidate recall of conventional localization pipelines. Specifically, we first obtain a set of candidate segments from existing localization model OSGNet, and then employ MLLM to select the segment that best matches the given query, thereby refining the final prediction. Ultimately, our method achieved first place in both the Natural Language Queries and GoalStep tracks. Our code can be found at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iLearn-Lab/CVPR25-OSGNet
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.