A Perceptual Alphabet for the 10-dimensional Phonetic-prosodic Space
Elaine Y L Tsiang

TL;DR
This paper introduces the IHA, a 10-dimensional perceptual alphabet for speech, based on observable features rather than articulatory ones, and discusses its implementation in a speech recognizer.
Contribution
It defines a novel 10-D perceptual phonetic alphabet (IHA) and models speech as a sequence in this space, advancing previous articulatory-based models.
Findings
IHA has been implemented in a speech recognizer.
Speech modeled as a chain in the 4-D phonetic subspace.
The model is based on the oral billiards analogy.
Abstract
We define an alphabet, the IHA, of the 10-D phonetic-prosodic space. The dimensions of this space are perceptual observables, rather than articulatory specifications. Speech is defined as a random chain in time of the 4-D phonetic subspace, that is, a symbolic sequence, augmented with diacritics of the remaining 6-D prosodic subspace. The definitions here are based on the model of speech of oral billiards, and supersedes an earlier version. This paper only enumerates the IHA in detail as a supplement to the exposition of oral billiards in a separate paper. The IHA has been implemented as the target random variable in a speech recognizer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image Retrieval and Classification Techniques · Speech Recognition and Synthesis
