A Cloud-Based Cross-Modal Transformer for Emotion Recognition and Adaptive Human-Computer Interaction
Ziwen Zhong, Zhitao Shu, Yue Zhao

TL;DR
This paper introduces a cloud-based cross-modal transformer framework that significantly improves real-time emotion recognition accuracy and efficiency by integrating visual, auditory, and textual data for advanced human-computer interaction.
Contribution
It presents a novel cloud-enabled multimodal transformer model with cross-modal attention, achieving state-of-the-art performance and low latency in emotion recognition tasks.
Findings
Achieves 3% higher F1-score on benchmark datasets.
Reduces response latency by 35% compared to traditional systems.
Demonstrates scalability and robustness in real-world HCI applications.
Abstract
Emotion recognition is a fundamental component of next-generation human-computer interaction (HCI), enabling machines to perceive, understand, and respond to users' affective states. However, existing systems often rely on single-modality analysis such as facial expressions, speech tone, or textual sentiment, resulting in limited robustness and poor generalization in real-world environments. To address these challenges, this study proposes a Cloud-Based Cross-Modal Transformer (CMT) framework for multimodal emotion recognition and adaptive human-computer interaction. The proposed model integrates visual, auditory, and textual signals using pretrained encoders (Vision Transformer, Wav2Vec2, and BERT) and employs a cross-modal attention mechanism to capture complex interdependencies among heterogeneous features. By leveraging cloud computing infrastructure with distributed training on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · EEG and Brain-Computer Interfaces
