Beyond peak accuracy: a stability-centric framework for reliable multimodal student engagement assessment

Ismail Said Almuniri; Hitham Alhussian; Norshakirah Aziz; Sallam O. F. Khairy; AlWaleed Sulaiman AlAbri; Zaid Fawaz Jarallah; Saidu Yahaya; Shamsuddeen Adamu

PMC · DOI:10.1038/s41598-025-31215-7·January 2, 2026

Beyond peak accuracy: a stability-centric framework for reliable multimodal student engagement assessment

Ismail Said Almuniri, Hitham Alhussian, Norshakirah Aziz, Sallam O. F. Khairy, AlWaleed Sulaiman AlAbri, Zaid Fawaz Jarallah, Saidu Yahaya, Shamsuddeen Adamu

PDF

Open Access

TL;DR

This paper introduces a new framework for assessing student engagement using multimodal data, focusing on stability and interpretability to improve reliability.

Contribution

The novel contribution is a stability-centric framework combining class-aware loss, temporal augmentation, and SHAP-based interpretability for multimodal student engagement assessment.

Findings

01

The framework achieved a mean accuracy of 0.901 and mean macro F1 of 0.847, outperforming existing models.

02

Temporal augmentation and ensemble diversity were identified as key contributors to model stability.

03

SHAP-based analysis provided reliable interpretability, linking predictions to behavioral and cognitive cues.

Abstract

Accurate assessment of student engagement is central to technology-enhanced learning, yet existing models remain constrained by class imbalance, instability across data splits, and limited interpretability. This study introduces a multimodal engagement assessment framework that addresses these issues through three complementary strategies: (1) class-aware loss functions to alleviate class imbalance, (2) temporal data augmentation and heterogeneous ensembling to enhance model stability, and (3) SHAP-based analysis of the most stable component for reliable interpretability. Reliability was established through repeated cross-validation with multiple seeds across seven deep learning architectures and the proposed ensemble. The framework established a mean accuracy of 0.901 ± 0.043 and a mean macro F1 of 0.847 ± 0.068, surpassing baselines such as ResNet (Accuracy = 0.917), Inception (Macro…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

SHROOM4

Proteins1

Species1

Homo sapiens(human · species)

Chemicals1

MDL

Diseases3

confusion fatigue DL

Figures11

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics · Emotion and Mood Recognition