# Parallel Time-Frequency Multi-Scale Attention with Dynamic Convolution for Environmental Sound Classification

**Authors:** Hongjie Wan, Hailei He, Yuying Li

PMC · DOI: 10.3390/e27101007 · Entropy · 2025-09-26

## TL;DR

This paper introduces a new neural network module for environmental sound classification that improves accuracy by better handling time and frequency features.

## Contribution

The novel PTFMSA module uses parallel branches and dynamic convolution to better capture time-frequency features in sound classification.

## Key findings

- PTFMSAN achieves 90% classification accuracy on the ESC-50 dataset, outperforming baseline models.
- Ablation studies confirm the effectiveness of the PTFMSA module and parallel branch structure.

## Abstract

Convolutional neural network (CNN) models are widely used for environmental sound classification (ESC). However, 2-D convolutions assume translation invariance along both time and frequency axes, while in practice the frequency dimension is not shift-invariant. Additionally, single-scale convolutions limit the receptive field, leading to incomplete feature representation. To address these issues, we introduce a parallel time-frequency multi-scale attention (PTFMSA) module that integrates local and global attention across multiple scales to improve dynamic convolution in order to overcome these problems. We also introduce the parallel branch structure to avoid mutual interference of information in case of extracting time and frequency domain features. Additionally, we utilize learnable parameters that can dynamically adjust the weights of different branches during network training. Building on this module, we develop PTFMSAN, a compact network that processes raw waveforms directly for ESC. To further strengthen learning, between-class (BC) training is applied. Experiments on the ESC-50 dataset show that PTFMSAN outperforms the baseline model, achieving a classification accuracy of 90%, competitive among CNN-based networks. We also performed ablation experiments to verify the effectiveness of each module.

## Full-text entities

- **Diseases:** ESC (MESH:D018876), injury to (MESH:D014947), TDC (MESH:C536956)
- **Chemicals:** ESC-50 (-), GLU (MESH:D018698)
- **Species:** Canis lupus familiaris (dog, subspecies) [taxon 9615], Felis catus (cat, species) [taxon 9685], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12564618/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12564618/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12564618/full.md

---
Source: https://tomesphere.com/paper/PMC12564618