TL;DR
This paper introduces L-GrIN, a novel graph-based neural network that jointly learns to recognize emotions and underlying data structures across multiple modalities, achieving state-of-the-art results efficiently.
Contribution
The paper presents a unified graph learning and classification framework with new graph convolution and pooling layers for emotion recognition across diverse data modalities.
Findings
Achieves state-of-the-art accuracy on five emotion recognition datasets.
Uses fewer parameters than traditional CNNs or RNNs, suitable for resource-limited devices.
Demonstrates effective cross-modality generalization and dynamic graph learning.
Abstract
Human emotion is expressed, perceived and captured using a variety of dynamic data modalities, such as speech (verbal), videos (facial expressions) and motion sensors (body gestures). We propose a generalized approach to emotion recognition that can adapt across modalities by modeling dynamic data as structured graphs. The motivation behind the graph approach is to build compact models without compromising on performance. To alleviate the problem of optimal graph construction, we cast this as a joint graph learning and classification task. To this end, we present the Learnable Graph Inception Network (L-GrIN) that jointly learns to recognize emotion and to identify the underlying graph structure in the dynamic data. Our architecture comprises multiple novel components: a new graph convolution operation, a graph inception layer, learnable adjacency, and a learnable pooling function that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
