Harnessing AI for Speech Reconstruction using Multi-view Silent Video   Feed

Yaman Kumar; Mayank Aggarwal; Pratham Nawal; Shin'ichi Satoh; Rajiv; Ratn Shah; Roger Zimmerman

arXiv:1807.00619·cs.SD·August 14, 2018

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Yaman Kumar, Mayank Aggarwal, Pratham Nawal, Shin'ichi Satoh, Rajiv, Ratn Shah, Roger Zimmerman

PDF

TL;DR

This paper introduces the first multi-view speech reading system that uses multiple silent video feeds from different angles to improve speech reconstruction, addressing pose variations and enhancing intelligibility.

Contribution

It presents a novel multi-view approach for speech reconstruction from silent videos, including optimal camera placement and potential applications across multimedia fields.

Findings

01

Multi-view video improves speech reconstruction accuracy.

02

Optimal camera placement enhances speech intelligibility.

03

System shows promise for security and multimedia analytics.

Abstract

Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.