Talking Detection In Collaborative Learning Environments
Wenjing Shi, Marios S. Pattichis, Sylvia Celed\'on-Pattichis, Carlos, L\'opezLeiva

TL;DR
This paper presents a simple, effective method for detecting talking activities in collaborative learning videos by using head detection and optical flow projections, outperforming complex 3D activity classification systems.
Contribution
The approach simplifies talking detection by avoiding complex training, using head detection and optical flow projections, and achieves higher accuracy and multi-speaker detection capabilities.
Findings
Achieves 59% accuracy in talking detection
Outperforms TSN and C3D methods
Detects multiple speakers and talking instances
Abstract
We study the problem of detecting talking activities in collaborative learning videos. Our approach uses head detection and projections of the log-magnitude of optical flow vectors to reduce the problem to a simple classification of small projection images without the need for training complex, 3-D activity classification systems. The small projection images are then easily classified using a simple majority vote of standard classifiers. For talking detection, our proposed approach is shown to significantly outperform single activity systems. We have an overall accuracy of 59% compared to 42% for Temporal Segment Network (TSN) and 45% for Convolutional 3D (C3D). In addition, our method is able to detect multiple talking instances from multiple speakers, while also detecting the speakers themselves.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Video Analysis and Summarization
