Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating

Saniah Kayenat Chowdhury; Muhammad E.H. Chowdhury

arXiv:2604.10078·cs.CV·April 14, 2026

Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating

Saniah Kayenat Chowdhury, Muhammad E.H. Chowdhury

PDF

TL;DR

This paper introduces DualEngage, a dual-stream framework that combines person-level motion dynamics and scene context via adaptive gating for accurate group engagement recognition in classroom videos.

Contribution

It presents a novel dual-stream approach that models both individual motion and scene context, with adaptive fusion, for improved group engagement detection.

Findings

01

Achieved 96.21% accuracy on the Classroom Group Engagement Dataset.

02

Demonstrated the effectiveness of dual-stream fusion over single-stream models.

03

Conducted ablation studies confirming the contribution of each stream.

Abstract

Student engagement is crucial for improving learning outcomes in group activities. Highly engaged students perform better both individually and contribute to overall group success. However, most existing automated engagement recognition methods are designed for online classrooms or estimate engagement at the individual level. Addressing this gap, we propose DualEngage, a novel two-stream framework for group-level engagement recognition from in-classroom videos. It models engagement as a joint function of both individual and group-level behaviors. The primary stream models person-level motion dynamics by detecting and tracking students, extracting dense optical flow with the Recurrent All-Pairs Field Transforms network, encoding temporal motion patterns using a transformer encoder, and finally aggregating per-student representations through attention pooling into a unified…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.