TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals
Alexander Vedernikov, Puneet Kumar, Haoyu Chen, Tapio Seppanen,, Xiaobai Li

TL;DR
TCCT-Net is a novel two-stream neural network architecture that efficiently estimates engagement using behavioral signals, combining convolutional-transformer and wavelet transform techniques for real-time applications.
Contribution
The paper introduces TCCT-Net, a two-stream architecture that improves engagement estimation speed and efficiency by integrating hybrid convolutional-transformer and wavelet-based feature extraction.
Findings
Outperforms existing baselines with fewer features.
Achieves an order-of-magnitude faster inference speed.
Uses only two behavioral features for engagement estimation.
Abstract
Engagement analysis finds various applications in healthcare, education, advertisement, services. Deep Neural Networks, used for analysis, possess complex architecture and need large amounts of input data, computational power, inference time. These constraints challenge embedding systems into devices for real-time use. To address these limitations, we present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain and boost processing speed, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form. Evaluated on the EngageNet dataset, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Emotion and Mood Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
