CuriosAI Submission to the EgoExo4D Proficiency Estimation Challenge 2025

Hayato Tanoue; Hiroki Nishihara; Yuma Suzuki; Takayuki Hori; Hiroki Takushima; Aiswariya Manojkumar; Yuki Shibata; Mitsuru Takeda; Fumika Beppu; Zhao Hengwei; Yuto Kanda; Daichi Yamaga

arXiv:2507.08022·cs.CV·July 14, 2025

CuriosAI Submission to the EgoExo4D Proficiency Estimation Challenge 2025

Hayato Tanoue, Hiroki Nishihara, Yuma Suzuki, Takayuki Hori, Hiroki Takushima, Aiswariya Manojkumar, Yuki Shibata, Mitsuru Takeda, Fumika Beppu, Zhao Hengwei, Yuto Kanda, Daichi Yamaga

PDF

TL;DR

This paper introduces two novel multi-view skill assessment methods for proficiency estimation, demonstrating improved accuracy through scenario-conditioned modeling techniques in the CVPR 2025 challenge.

Contribution

The paper presents a multi-task learning framework and a two-stage pipeline for proficiency estimation, advancing multi-view skill assessment methods.

Findings

01

Multi-task learning approach achieved 43.6% accuracy.

02

Two-stage pipeline achieved 47.8% accuracy.

03

Scenario-conditioned modeling improves proficiency estimation.

Abstract

This report presents the CuriosAI team's submission to the EgoExo4D Proficiency Estimation Challenge at CVPR 2025. We propose two methods for multi-view skill assessment: (1) a multi-task learning framework using Sapiens-2B that jointly predicts proficiency and scenario labels (43.6 % accuracy), and (2) a two-stage pipeline combining zero-shot scenario recognition with view-specific VideoMAE classifiers (47.8 % accuracy). The superior performance of the two-stage approach demonstrates the effectiveness of scenario-conditioned modeling for proficiency estimation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.