AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education

Hanchen David Wang; Yilin Liu; Madison J. Lee; Surya Chand Rayala; Gautam Biswas; Daniel T. Levin; Meiyi Ma

arXiv:2605.20233·cs.CV·May 21, 2026

AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education

Hanchen David Wang, Yilin Liu, Madison J. Lee, Surya Chand Rayala, Gautam Biswas, Daniel T. Levin, Meiyi Ma

PDF

TL;DR

This study explores using vision-language models to assess nursing students' competency from egocentric simulation videos, revealing that recognition accuracy correlates with competency levels.

Contribution

It introduces a three-stage framework leveraging frozen visual encoders and few-shot learning for competency assessment from egocentric videos, highlighting the relationship between recognition accuracy and competency.

Findings

01

Frozen DINOv2 model achieves 57.4% MOF in action recognition.

02

Higher competency correlates with more diverse and harder-to-classify workflows.

03

Recognition accuracy may serve as an informative signal for competency assessment.

Abstract

Assessing learner competency in clinical simulation requires expert observation that is time-intensive, difficult to scale, and subject to inter-rater variability. Vision-language models have emerged as a promising tool for understanding complex visual behavior. In this work, we investigate whether visual observations can provide educationally meaningful signals for competency assessment through a three-stage framework that (1) extracts action timelines from egocentric nursing simulation video using frozen visual encoders and few-shot learning, (2) derives sequence-level features and per-session recognition metrics, and (3) relates these to instructor-rated competency. Across 22 densely annotated sessions (3.8 hours, 493 actions), a frozen DINOv2 backbone with HMM Viterbi decoding achieves 57.4% MOF in leave-one-out 1-shot recognition. Surprisingly, we observe a negative trend between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.