TL;DR
This paper explores how incorporating face masks into the training of Variational Autoencoders improves face reconstruction quality, especially when using SSIM loss, by focusing learning on relevant facial regions.
Contribution
It introduces a method of using face masks to restrict VAE training pixels, demonstrating enhanced face reconstruction, and analyzes the effects of different loss functions and architecture modifications.
Findings
Face masks improve reconstruction quality with SSIM loss.
SSIM loss produces the sharpest images but alters colors.
Including a face mask prediction decoder affects performance depending on the loss used.
Abstract
Variational AutoEncoders (VAE) employ deep learning models to learn a continuous latent z-space that is subjacent to a high-dimensional observed dataset. With that, many tasks are made possible, including face reconstruction and face synthesis. In this work, we investigated how face masks can help the training of VAEs for face reconstruction, by restricting the learning to the pixels selected by the face mask. An evaluation of the proposal using the celebA dataset shows that the reconstructed images are enhanced with the face masks, especially when SSIM loss is used either with l1 or l2 loss functions. We noticed that the inclusion of a decoder for face mask prediction in the architecture affected the performance for l1 or l2 loss functions, while this was not the case for the SSIM loss. Besides, SSIM perceptual loss yielded the crispest samples between all hypotheses tested, although…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
