Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach

Elena Ryumina; Maxim Markitantov; Alexandr Axyonov; Dmitry Ryumin; Mikhail Dolgushin; Alexey Karpov

arXiv:2507.02205·cs.CV·July 8, 2025

Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach

Elena Ryumina, Maxim Markitantov, Alexandr Axyonov, Dmitry Ryumin, Mikhail Dolgushin, Alexey Karpov

PDF

TL;DR

This paper introduces a novel zero-shot multimodal approach for compound expression recognition that combines six heterogeneous modalities and employs advanced semantic understanding techniques, achieving competitive results without domain-specific training.

Contribution

The work presents a new zero-shot multimodal framework with a Multi-Head Probability Fusion module and semantic scene understanding, advancing affective computing for complex emotion detection.

Findings

01

Achieved F1 scores of 46.95% on AffWild2

02

Achieved F1 scores of 49.02% on AFEW

03

Achieved F1 scores of 34.85% on C-EXPR-DB

Abstract

Compound Expression Recognition (CER), a subfield of affective computing, aims to detect complex emotional states formed by combinations of basic emotions. In this work, we present a novel zero-shot multimodal approach for CER that combines six heterogeneous modalities into a single pipeline: static and dynamic facial expressions, scene and label matching, scene context, audio, and text. Unlike previous approaches relying on task-specific training data, our approach uses zero-shot components, including Contrastive Language-Image Pretraining (CLIP)-based label matching and Qwen-VL for semantic scene understanding. We further introduce a Multi-Head Probability Fusion (MHPF) module that dynamically weights modality-specific predictions, followed by a Compound Expressions (CE) transformation module that uses Pair-Wise Probability Aggregation (PPA) and Pair-Wise Feature Similarity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.