Agent-Based Modular Learning for Multimodal Emotion Recognition in Human-Agent Systems

Matvey Nepomnyaschiy; Oleg Pereziabov; Anvar Tliamov; Stanislav Mikhailov; Ilya Afanasyev

arXiv:2512.10975·cs.LG·December 15, 2025

Agent-Based Modular Learning for Multimodal Emotion Recognition in Human-Agent Systems

Matvey Nepomnyaschiy, Oleg Pereziabov, Anvar Tliamov, Stanislav Mikhailov, Ilya Afanasyev

PDF

Open Access

TL;DR

This paper introduces a multi-agent modular framework for multimodal emotion recognition in human-agent systems, enhancing flexibility, scalability, and training efficiency by treating modality encoders and classifiers as autonomous agents.

Contribution

It presents a novel multi-agent architecture that allows modular integration, replacement, and efficient training of multimodal emotion recognition components.

Findings

01

Supports vision, audio, and text modalities in a proof-of-concept implementation.

02

Reduces computational overhead during training.

03

Improves flexibility and scalability of emotion recognition systems.

Abstract

Effective human-agent interaction (HAI) relies on accurate and adaptive perception of human emotional states. While multimodal deep learning models - leveraging facial expressions, speech, and textual cues - offer high accuracy in emotion recognition, their training and maintenance are often computationally intensive and inflexible to modality changes. In this work, we propose a novel multi-agent framework for training multimodal emotion recognition systems, where each modality encoder and the fusion classifier operate as autonomous agents coordinated by a central supervisor. This architecture enables modular integration of new modalities (e.g., audio features via emotion2vec), seamless replacement of outdated components, and reduced computational overhead during training. We demonstrate the feasibility of our approach through a proof-of-concept implementation supporting vision, audio,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Social Robot Interaction and HRI · Face recognition and analysis