Lip2AudSpec: Speech reconstruction from silent lip movements video

Hassan Akbari; Himani Arora; Liangliang Cao; Nima Mesgarani

arXiv:1710.09798·cs.CV·October 27, 2017

Lip2AudSpec: Speech reconstruction from silent lip movements video

Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani

PDF

1 Repo

TL;DR

This paper introduces a deep neural network that reconstructs intelligible speech from silent lip videos using auditory spectrograms, achieving high correlation and improved speech quality.

Contribution

It presents a novel end-to-end deep learning approach combining autoencoders and lip reading networks for speech reconstruction from silent videos.

Findings

01

Autoencoder reconstructs spectrogram with 98% correlation.

02

Reconstructed speech has improved naturalness and intelligibility.

03

Model generalizes across different speakers.

Abstract

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hassanhub/LipReading
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Solana Customer Service Number +1-833-534-1729 · Long Short-Term Memory