Modelling Temporal Information Using Discrete Fourier Transform for Recognizing Emotions in User-generated Videos
Haimin Zhang, Min Xu

TL;DR
This paper introduces a method that combines deep CNN features with frequency domain DFT features to effectively model temporal information for emotion recognition in user-generated videos, achieving state-of-the-art accuracy.
Contribution
It proposes a novel approach that transforms frame-level features into the frequency domain using DFT to capture temporal dynamics for improved emotion recognition.
Findings
DFT features enhance temporal modeling in videos.
The method improves emotion recognition accuracy from 51.1% to 62.6%.
Achieves state-of-the-art performance on VideoEmotion-8 dataset.
Abstract
With the widespread of user-generated Internet videos, emotion recognition in those videos attracts increasing research efforts. However, most existing works are based on framelevel visual features and/or audio features, which might fail to model the temporal information, e.g. characteristics accumulated along time. In order to capture video temporal information, in this paper, we propose to analyse features in frequency domain transformed by discrete Fourier transform (DFT features). Frame-level features are firstly extract by a pre-trained deep convolutional neural network (CNN). Then, time domain features are transferred and interpolated into DFT features. CNN and DFT features are further encoded and fused for emotion classification. By this way, static image features extracted from a pre-trained deep CNN and temporal information represented by DFT features are jointly considered for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Emotion and Mood Recognition · Video Analysis and Summarization
