Three-Class Emotion Classification for Audiovisual Scenes Based on Ensemble Learning Scheme
Xiangrui Xiong, Zhou Zhou, Guocai Nong, Junlin Deng, Ning Wu

TL;DR
This paper introduces an audio-only ensemble learning framework for classifying movie scenes into three emotional categories, achieving high accuracy and suitable for resource-limited devices.
Contribution
A novel lightweight ensemble model combining SVMs and neural networks for audio-based emotion classification in audiovisual scenes.
Findings
Achieved 86% accuracy on real-world dataset
Effective feature extraction and preprocessing pipeline
Demonstrated suitability for resource-constrained environments
Abstract
Emotion recognition plays a pivotal role in enhancing human-computer interaction, particularly in movie recommendation systems where understanding emotional content is essential. While multimodal approaches combining audio and video have demonstrated effectiveness, their reliance on high-performance graphical computing limits deployment on resource-constrained devices such as personal computers or home audiovisual systems. To address this limitation, this study proposes a novel audio-only ensemble learning framework capable of classifying movie scenes into three emotional categories: Good, Neutral, and Bad. The model integrates ten support vector machines and six neural networks within a stacking ensemble architecture to enhance classification performance. A tailored data preprocessing pipeline, including feature extraction, outlier handling, and feature engineering, is designed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Music and Audio Processing · Sentiment Analysis and Opinion Mining
