B-GRPO: Unsupervised Speech Emotion Recognition based on Batched-Group Relative Policy Optimization
Yingying Gao, Shilei Zhang, Runyan Yang, Zihao Cui, Junlan Feng

TL;DR
This paper introduces B-GRPO, an unsupervised speech emotion recognition method using a modified reinforcement learning approach that improves performance by leveraging batch-based sample selection and self-reward functions.
Contribution
It proposes a novel batch-based reinforcement learning framework for unsupervised speech emotion recognition, incorporating self-reward and teacher-reward functions for better sample quality assessment.
Findings
Achieved a 19.8% performance improvement over baseline without RL.
Introduced a batch-group policy optimization method tailored for classification.
Demonstrated effectiveness of self-reward functions in unsupervised SER.
Abstract
Unsupervised speech emotion recognition (SER) focuses on addressing the problem of data sparsity and annotation bias of emotional speech. Reinforcement learning (RL) is a promising method which enhances the performance through rule-based or model-based verification functions rather than human annotations. We treat the sample selection during the learning process as a long-term procedure and whether to select a sample as the action to make policy, thus achieving the application of RL to measure sample quality in SER. We propose a modified Group Relative Policy Optimization (GRPO) to adapt it to classification problems, which takes the samples in a batch as a group and uses the average reward of these samples as the baseline to calculate the advantage. And rather than using a verifiable reward function as in GRPO, we put forward self-reward functions and teacher-reward functions to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Voice and Speech Disorders
