Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating
Saniah Kayenat Chowdhury, Muhammad E.H. Chowdhury

TL;DR
This paper introduces DualEngage, a dual-stream framework that combines person-level motion dynamics and scene context via adaptive gating for accurate group engagement recognition in classroom videos.
Contribution
It presents a novel dual-stream approach that models both individual motion and scene context, with adaptive fusion, for improved group engagement detection.
Findings
Achieved 96.21% accuracy on the Classroom Group Engagement Dataset.
Demonstrated the effectiveness of dual-stream fusion over single-stream models.
Conducted ablation studies confirming the contribution of each stream.
Abstract
Student engagement is crucial for improving learning outcomes in group activities. Highly engaged students perform better both individually and contribute to overall group success. However, most existing automated engagement recognition methods are designed for online classrooms or estimate engagement at the individual level. Addressing this gap, we propose DualEngage, a novel two-stream framework for group-level engagement recognition from in-classroom videos. It models engagement as a joint function of both individual and group-level behaviors. The primary stream models person-level motion dynamics by detecting and tracking students, extracting dense optical flow with the Recurrent All-Pairs Field Transforms network, encoding temporal motion patterns using a transformer encoder, and finally aggregating per-student representations through attention pooling into a unified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
