An Investigation of the Effectiveness of Phase for Audio Classification

Shunsuke Hidaka; Kohei Wakamiya; Tokihiko Kaburagi

arXiv:2110.02878·cs.SD·May 2, 2022

An Investigation of the Effectiveness of Phase for Audio Classification

Shunsuke Hidaka, Kohei Wakamiya, Tokihiko Kaburagi

PDF

1 Repo

TL;DR

This paper investigates the role of phase information in audio classification, demonstrating that including phase can significantly improve performance across various tasks, though it may cause overfitting in some cases.

Contribution

The study introduces a learnable front-end for extracting phase information and evaluates its impact on multiple audio classification tasks, highlighting the importance of phase relationships.

Findings

01

Significant performance improvements in musical pitch, instrument, language, speaker, and birdsong detection.

02

Overfitting issues observed when using instantaneous frequency for some tasks.

03

Relationship between phase values of adjacent elements is more crucial than phase itself.

Abstract

While log-amplitude mel-spectrogram has widely been used as the feature representation for processing speech based on deep learning, the effectiveness of another aspect of speech spectrum, i.e., phase information, was shown recently for tasks such as speech enhancement and source separation. In this study, we extensively investigated the effectiveness of including phase information of signals for eight audio classification tasks. We constructed a learnable front-end that can compute the phase and its derivatives based on a time-frequency representation with mel-like frequency axis. As a result, experimental results showed significant performance improvement for musical pitch detection, musical instrument detection, language identification, speaker identification, and birdsong detection. On the other hand, overfitting to the recording condition was observed for some tasks when the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

onkyo14taro/investigation-phase
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.