# Advanced feature selection and temporal attention mechanisms with Bi-LSTM classifier for optimizing emotion recognition in Kashmiri speech

**Authors:** GH Mohmad Dar, Radhakrishnan Delhibabu

PMC · DOI: 10.3389/frai.2026.1768701 · Frontiers in Artificial Intelligence · 2026-03-18

## TL;DR

This paper presents a new method for improving emotion recognition in Kashmiri speech using advanced feature selection and attention mechanisms in LSTM networks.

## Contribution

The novelty lies in integrating temporal attention mechanisms with optimized feature selection for emotion recognition in the Kashmiri language.

## Key findings

- The attention-augmented LSTM model achieved 90.2% accuracy, surpassing the baseline LSTM model's 86%.
- Significant improvements in precision, recall, and F1-scores were observed across emotional categories.
- The method provides a baseline for speech emotion recognition in low-resource languages like Kashmiri.

## Abstract

This study introduces an advanced methodology for enhancing emotion recognition in Kashmiri speech by leveraging optimized feature selection and integrating temporal attention mechanisms into Long Short-Term Memory (LSTM) networks. A meticulous feature selection process identified key acoustic features, including Mel Frequency Cepstral Coefficients (MFCCs), Linear Predictive Coding (LPC), and other relevant descriptors, as optimal for emotion classification. The incorporation of temporal attention layers significantly improved the model's capacity to capture complex emotional patterns and temporal dynamics within the speech data. The proposed attention-augmented LSTM model achieved an accuracy of 90.2%, outperforming the baseline LSTM model's accuracy of 86%. Notable improvements in precision, recall, and F1-scores across multiple emotional categories further highlight the efficacy of the attention mechanism in capturing subtle emotional variations. In addition to performance gains, the study provides a clear research direction by demonstrating how attention–based temporal modeling can benefit low-resource languages such as Kashmiri, where linguistic and prosodic cues differ significantly from widely studied languages. The findings therefore establish a methodological baseline that supports future SER deployments in digital domains, including chat-based systems, affect-aware agents, and other human–machine interfaces. These findings underscore the model's ability to enhance both the sensitivity and specificity of emotion recognition systems, offering a robust and efficient framework for speech-based emotion analysis. Future work will extend the proposed methodology to multilingual settings and incorporate multimodal information, enabling deeper analysis of emotional expression across diverse linguistic and cultural contexts.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13039000/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13039000/full.md

## References

63 references — full list in the complete paper: https://tomesphere.com/paper/PMC13039000/full.md

---
Source: https://tomesphere.com/paper/PMC13039000