AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education
Hanchen David Wang, Yilin Liu, Madison J. Lee, Surya Chand Rayala, Gautam Biswas, Daniel T. Levin, Meiyi Ma

TL;DR
This study explores using vision-language models to assess nursing students' competency from egocentric simulation videos, revealing that recognition accuracy correlates with competency levels.
Contribution
It introduces a three-stage framework leveraging frozen visual encoders and few-shot learning for competency assessment from egocentric videos, highlighting the relationship between recognition accuracy and competency.
Findings
Frozen DINOv2 model achieves 57.4% MOF in action recognition.
Higher competency correlates with more diverse and harder-to-classify workflows.
Recognition accuracy may serve as an informative signal for competency assessment.
Abstract
Assessing learner competency in clinical simulation requires expert observation that is time-intensive, difficult to scale, and subject to inter-rater variability. Vision-language models have emerged as a promising tool for understanding complex visual behavior. In this work, we investigate whether visual observations can provide educationally meaningful signals for competency assessment through a three-stage framework that (1) extracts action timelines from egocentric nursing simulation video using frozen visual encoders and few-shot learning, (2) derives sequence-level features and per-session recognition metrics, and (3) relates these to instructor-rated competency. Across 22 densely annotated sessions (3.8 hours, 493 actions), a frozen DINOv2 backbone with HMM Viterbi decoding achieves 57.4% MOF in leave-one-out 1-shot recognition. Surprisingly, we observe a negative trend between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
