Shennong: a Python toolbox for audio speech features extraction
Mathieu Bernard, Maxime Poli, Julien Karadayi, Emmanuel, Dupoux

TL;DR
Shennong is an open-source Python toolkit that simplifies extraction of speech features, integrating multiple algorithms for speech analysis, normalization, and post-processing, facilitating research and development in speech processing.
Contribution
It introduces a comprehensive, extensible Python framework that consolidates various speech feature extraction algorithms, replacing multiple software tools with a unified solution.
Findings
Speech features performance varies with task and conditions.
Vocal Tract Length Normalization effectiveness depends on speech duration.
Pitch estimation accuracy varies under different noise environments.
Abstract
We introduce Shennong, a Python toolbox and command-line utility for speech features extraction. It implements a wide range of well-established state of art algorithms including spectro-temporal filters such as Mel-Frequency Cepstral Filterbanks or Predictive Linear Filters, pre-trained neural networks, pitch estimators as well as speaker normalization methods and post-processing algorithms. Shennong is an open source, easy-to-use, reliable and extensible framework. The use of Python makes the integration to others speech modeling and machine learning tools easy. It aims to replace or complement several heterogeneous software, such as Kaldi or Praat. After describing the Shennong software architecture, its core components and implemented algorithms, this paper illustrates its use on three applications: a comparison of speech features performances on a phones discrimination task, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
