CuriosAI Submission to the EgoExo4D Proficiency Estimation Challenge 2025
Hayato Tanoue, Hiroki Nishihara, Yuma Suzuki, Takayuki Hori, Hiroki Takushima, Aiswariya Manojkumar, Yuki Shibata, Mitsuru Takeda, Fumika Beppu, Zhao Hengwei, Yuto Kanda, Daichi Yamaga

TL;DR
This paper introduces two novel multi-view skill assessment methods for proficiency estimation, demonstrating improved accuracy through scenario-conditioned modeling techniques in the CVPR 2025 challenge.
Contribution
The paper presents a multi-task learning framework and a two-stage pipeline for proficiency estimation, advancing multi-view skill assessment methods.
Findings
Multi-task learning approach achieved 43.6% accuracy.
Two-stage pipeline achieved 47.8% accuracy.
Scenario-conditioned modeling improves proficiency estimation.
Abstract
This report presents the CuriosAI team's submission to the EgoExo4D Proficiency Estimation Challenge at CVPR 2025. We propose two methods for multi-view skill assessment: (1) a multi-task learning framework using Sapiens-2B that jointly predicts proficiency and scenario labels (43.6 % accuracy), and (2) a two-stage pipeline combining zero-shot scenario recognition with view-specific VideoMAE classifiers (47.8 % accuracy). The superior performance of the two-stage approach demonstrates the effectiveness of scenario-conditioned modeling for proficiency estimation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
