Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading
Ha Jong Won, Li Gwang Chol, Kim Hyok Chol, Li Kum Song (College of, Computer Science, Kim Il Sung University)

TL;DR
This paper defines Korean visemes, proposes a method to extract 20-dimensional visual feature vectors combining static and dynamic features, and evaluates word recognition efficiency using a 3-viseme HMM.
Contribution
It introduces a new viseme definition for Korean and a feature extraction method that enhances lip-reading accuracy.
Findings
Effective viseme definitions for Korean
Successful extraction of combined static and dynamic features
Improved word recognition accuracy with 3-viseme HMM
Abstract
In this paper, we defined the viseme (visual speech element) and described about the method of extracting visual feature vector. We defined the 10 visemes based on vowel by analyzing of Korean utterance and proposed the method of extracting the 20-dimensional visual feature vector, combination of static features and dynamic features. Lastly, we took an experiment in recognizing words based on 3-viseme HMM and evaluated the efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Video Analysis and Summarization · Speech and dialogue systems
