Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision
Elena Ryumina, Maxim Markitantov, Dmitry Ryumin, Heysem Kaya, Alexey, Karpov

TL;DR
This paper introduces a zero-shot audio-visual method for compound expression recognition that fuses modalities at the emotion probability level and uses rule-based decisions, achieving a 22.01% F1-score.
Contribution
The paper presents a novel zero-shot approach combining modality fusion and rule-based decision-making for compound expression recognition without task-specific training.
Findings
Achieved 22.01% F1-score on C-EXPR-DB test set.
Demonstrated potential for annotating audio-visual emotional data.
Validated effectiveness in multi-corpus and cross-corpus setups.
Abstract
This paper presents the results of the SUN team for the Compound Expressions Recognition Challenge of the 6th ABAW Competition. We propose a novel audio-visual method for compound expression recognition. Our method relies on emotion recognition models that fuse modalities at the emotion probability level, while decisions regarding the prediction of compound expressions are based on predefined rules. Notably, our method does not use any training data specific to the target task. Thus, the problem is a zero-shot classification task. The method is evaluated in multi-corpus training and cross-corpus validation setups. Using our proposed method is achieved an F1-score value equals to 22.01% on the C-EXPR-DB test subset. Our findings from the challenge demonstrate that the proposed method can potentially form a basis for developing intelligent tools for annotating audio-visual data in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation and Modeling Applications
