MARS: Technical Report for the CASTLE Challenge at EgoVis 2026

Haoyu Zhang; Qiaohui Chu; Yisen Feng; Meng Liu; Weili Guan; Yaowei Wang; Liqiang Nie

arXiv:2605.18176·cs.CV·May 19, 2026

MARS: Technical Report for the CASTLE Challenge at EgoVis 2026

Haoyu Zhang, Qiaohui Chu, Yisen Feng, Meng Liu, Weili Guan, Yaowei Wang, Liqiang Nie

PDF

1 Repo

TL;DR

MARS is a multimodal reasoning system designed for the CASTLE Challenge, integrating diverse sources like videos, transcripts, and auxiliary data to answer complex questions over multiple days.

Contribution

The paper introduces MARS, a novel agentic evidence-selection approach that effectively handles multimodal data for complex reasoning tasks in egocentric benchmarks.

Findings

01

Achieved second place on the CASTLE Challenge leaderboard.

02

Effectively integrates multiple modalities including videos, transcripts, gaze, and thermal imagery.

03

Uses GPT-5.4 as a decision agent for evidence selection and reasoning.

Abstract

This report presents MARS, short for Multimodal Agentic Reasoning with Source selection, our system for the CASTLE Challenge at EgoVis 2026. Participants must answer 185 closed-form questions over the CASTLE 2024 dataset. In contrast to prior single-video egocentric benchmarks, CASTLE requires reasoning over four days of activity, 15 synchronized perspectives, official transcripts, and multiple auxiliary modalities, including personal photos, auxiliary videos, gaze, thermal imagery, and heartrate measurements. MARS therefore treats the task as an agentic evidence-selection problem over multimodal sources rather than a purely text-only pipeline. MARS first follows the official CASTLE directory organization to build evidence memories from two primary sources, videos and transcripts, and four auxiliary sources, gaze, heartrate, photos, and thermal imagery. Long videos are converted into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hyu-Zhang/MARS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.