TorchDIVA: An Extensible Computational Model of Speech Production built   on an Open-Source Machine Learning Library

Sean Kinahan; Julie Liss; Visar Berisha

arXiv:2210.09334·eess.AS·October 19, 2022

TorchDIVA: An Extensible Computational Model of Speech Production built on an Open-Source Machine Learning Library

Sean Kinahan, Julie Liss, Visar Berisha

PDF

Open Access 1 Repo

TL;DR

TorchDIVA is a Python-based, open-source implementation of the DIVA speech production model that integrates modern machine learning tools, enabling easier extensibility and improved speech quality enhancement.

Contribution

The paper introduces TorchDIVA, a complete Python translation of DIVA using PyTorch, facilitating integration with machine learning tools and demonstrating extensibility with speech quality improvements.

Findings

01

TorchDIVA closely matches the original DIVA model's outputs.

02

Integration with DiffWave improves speech quality metrics.

03

The Python implementation enables easier extensibility and research applications.

Abstract

The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

skinahan/diva_pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Computational Physics and Python Applications