Transformer-Driven Modeling of Variable Frequency Features for Classifying Student Engagement in Online Learning
Sandeep Mandia, Kuldeep Singh, Rajendra Mitharwal, Faisel Mushtaq, and, Dimpal Janu

TL;DR
This paper introduces EngageFormer, a transformer-based model that effectively classifies student engagement in online learning using video data, achieving state-of-the-art accuracy on multiple datasets.
Contribution
The paper presents a novel transformer architecture with sequence pooling for engagement classification, demonstrating superior performance on diverse affective state datasets.
Findings
Achieved up to 99.16% accuracy on BAUM-1 dataset.
State-of-the-art results on DAiSEE and YawDD datasets.
Provides a baseline for future engagement classification research.
Abstract
The COVID-19 pandemic and the internet's availability have recently boosted online learning. However, monitoring engagement in online learning is a difficult task for teachers. In this context, timely automatic student engagement classification can help teachers in making adaptive adjustments to meet students' needs. This paper proposes EngageFormer, a transformer based architecture with sequence pooling using video modality for engagement classification. The proposed architecture computes three views from the input video and processes them in parallel using transformer encoders; the global encoder then processes the representation from each encoder, and finally, multi layer perceptron (MLP) predicts the engagement level. A learning centered affective state dataset is curated from existing open source databases. The proposed method achieved an accuracy of 63.9%, 56.73%, 99.16%, 65.67%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics
