GIPFA: Generating IPA Pronunciation from Audio
Xavier Marjou

TL;DR
This paper introduces GIPFA, an ANN-based system that automatically generates IPA pronunciations from audio, achieving 75% accuracy on French words and aiding in error detection and phoneme identification.
Contribution
The study presents a novel neural network approach for automatic IPA transcription from audio, specifically tailored for French, with insights into dataset errors and phoneme similarities.
Findings
75% accuracy in IPA prediction on test data
Model helps identify dataset errors and phoneme similarities
Demonstrates feasibility of automated IPA transcription from audio
Abstract
Transcribing spoken audio samples into the International Phonetic Alphabet (IPA) has long been reserved for experts. In this study, we examine the use of an Artificial Neural Network (ANN) model to automatically extract the IPA phonemic pronunciation of a word based on its audio pronunciation, hence its name Generating IPA Pronunciation From Audio (GIPFA). Based on the French Wikimedia dictionary, we trained our model which then correctly predicted 75% of the IPA pronunciations tested. Interestingly, by studying inference errors, the model made it possible to highlight possible errors in the dataset as well as to identify the closest phonemes in French.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing
