TCCT-Net: Two-Stream Network Architecture for Fast and Efficient   Engagement Estimation via Behavioral Feature Signals

Alexander Vedernikov; Puneet Kumar; Haoyu Chen; Tapio Seppanen,; Xiaobai Li

arXiv:2404.09474·cs.CV·May 15, 2024·1 cites

TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals

Alexander Vedernikov, Puneet Kumar, Haoyu Chen, Tapio Seppanen,, Xiaobai Li

PDF

Open Access 1 Repo

TL;DR

TCCT-Net is a novel two-stream neural network architecture that efficiently estimates engagement using behavioral signals, combining convolutional-transformer and wavelet transform techniques for real-time applications.

Contribution

The paper introduces TCCT-Net, a two-stream architecture that improves engagement estimation speed and efficiency by integrating hybrid convolutional-transformer and wavelet-based feature extraction.

Findings

01

Outperforms existing baselines with fewer features.

02

Achieves an order-of-magnitude faster inference speed.

03

Uses only two behavioral features for engagement estimation.

Abstract

Engagement analysis finds various applications in healthcare, education, advertisement, services. Deep Neural Networks, used for analysis, possess complex architecture and need large amounts of input data, computational power, inference time. These constraints challenge embedding systems into devices for real-time use. To address these limitations, we present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain and boost processing speed, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form. Evaluated on the EngageNet dataset, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vedernikovphoto/tcct_net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Emotion and Mood Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings