TorchDIVA: An Extensible Computational Model of Speech Production built on an Open-Source Machine Learning Library
Sean Kinahan, Julie Liss, Visar Berisha

TL;DR
TorchDIVA is a Python-based, open-source implementation of the DIVA speech production model that integrates modern machine learning tools, enabling easier extensibility and improved speech quality enhancement.
Contribution
The paper introduces TorchDIVA, a complete Python translation of DIVA using PyTorch, facilitating integration with machine learning tools and demonstrating extensibility with speech quality improvements.
Findings
TorchDIVA closely matches the original DIVA model's outputs.
Integration with DiffWave improves speech quality metrics.
The Python implementation enables easier extensibility and research applications.
Abstract
The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning tools which are freely available in the Python ecosystem that cannot be easily integrated with DIVA. We present TorchDIVA, a full rebuild of DIVA in Python using PyTorch tensors. DIVA source code was directly translated from Matlab to Python, and built-in Simulink signal blocks were implemented from scratch. After implementation, the accuracy of each module was evaluated via systematic block-by-block validation. The TorchDIVA model is shown to produce outputs that closely match those of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Computational Physics and Python Applications
