Definition of Visual Speech Element and Research on a Method of   Extracting Feature Vector for Korean Lip-Reading

Ha Jong Won; Li Gwang Chol; Kim Hyok Chol; Li Kum Song (College of; Computer Science; Kim Il Sung University)

arXiv:1411.4114·cs.CL·November 19, 2014

Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading

Ha Jong Won, Li Gwang Chol, Kim Hyok Chol, Li Kum Song (College of, Computer Science, Kim Il Sung University)

PDF

Open Access

TL;DR

This paper defines Korean visemes, proposes a method to extract 20-dimensional visual feature vectors combining static and dynamic features, and evaluates word recognition efficiency using a 3-viseme HMM.

Contribution

It introduces a new viseme definition for Korean and a feature extraction method that enhances lip-reading accuracy.

Findings

01

Effective viseme definitions for Korean

02

Successful extraction of combined static and dynamic features

03

Improved word recognition accuracy with 3-viseme HMM

Abstract

In this paper, we defined the viseme (visual speech element) and described about the method of extracting visual feature vector. We defined the 10 visemes based on vowel by analyzing of Korean utterance and proposed the method of extracting the 20-dimensional visual feature vector, combination of static features and dynamic features. Lastly, we took an experiment in recognizing words based on 3-viseme HMM and evaluated the efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Video Analysis and Summarization · Speech and dialogue systems