An Interpretable Closed-Loop Intelligent Tutoring System for Multimodal Affective Feedback in Asynchronous Presentation Training
Hung-Yue Suen, Kuo-En Hung

TL;DR
This paper introduces an interpretable closed-loop ITS that uses multimodal data and a structured feedback system to improve on-camera presentation skills, validated through large-scale data and a learner study.
Contribution
It develops a novel, explainable feedback architecture connecting multimodal scoring with coaching, enabling scalable, effective practice for presentation skills.
Findings
Achieved rubric-aligned scoring with expert-level performance (R2=0.48-0.61).
Participants showed significant improvement in all BARS dimensions (d=0.39-0.90).
Practice frequency correlated positively with posttest performance.
Abstract
This paper presents an interpretable closed-loop Intelligent Tutoring System (ITS) that supports feedback-guided practice for developing on-camera oral presentation skills at scale. The system operationalizes a seven-dimensional Behaviorally Anchored Rating Scale (BARS) and implements a three-layer interpretable feedback architecture that connects rubric-aligned multimodal scoring, audience-perceived expressive diagnostics, and retrieval-augmented conversational coaching to support deliberate practice. Built on an XGBoost backbone, the ITS maps multimodal inputs (facial, vocal, textual, and oculomotor features) into evidence-based feedback that can be traced back to observable performance cues. Trained on 10,360 Massive Open Online Course (MOOC) video segments, the system achieved rubric-aligned scoring with performance levels comparable to expert ratings (R2 = 0.48-0.61, Spearman's rho…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
