Shennong: a Python toolbox for audio speech features extraction

Mathieu Bernard; Maxime Poli; Julien Karadayi; Emmanuel; Dupoux

arXiv:2112.05555·cs.CL·February 9, 2023

Shennong: a Python toolbox for audio speech features extraction

Mathieu Bernard, Maxime Poli, Julien Karadayi, Emmanuel, Dupoux

PDF

Open Access 1 Repo

TL;DR

Shennong is an open-source Python toolkit that simplifies extraction of speech features, integrating multiple algorithms for speech analysis, normalization, and post-processing, facilitating research and development in speech processing.

Contribution

It introduces a comprehensive, extensible Python framework that consolidates various speech feature extraction algorithms, replacing multiple software tools with a unified solution.

Findings

01

Speech features performance varies with task and conditions.

02

Vocal Tract Length Normalization effectiveness depends on speech duration.

03

Pitch estimation accuracy varies under different noise environments.

Abstract

We introduce Shennong, a Python toolbox and command-line utility for speech features extraction. It implements a wide range of well-established state of art algorithms including spectro-temporal filters such as Mel-Frequency Cepstral Filterbanks or Predictive Linear Filters, pre-trained neural networks, pitch estimators as well as speaker normalization methods and post-processing algorithms. Shennong is an open source, easy-to-use, reliable and extensible framework. The use of Python makes the integration to others speech modeling and machine learning tools easy. It aims to replace or complement several heterogeneous software, such as Kaldi or Praat. After describing the Shennong software architecture, its core components and implemented algorithms, this paper illustrates its use on three applications: a comparison of speech features performances on a phones discrimination task, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bootphon/shennong
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing