Face2Text revisited: Improved data set and baseline results
Marc Tanti, Shaun Abdilla, Adrian Muscat, Claudia Borg, Reuben A., Farrugia, Albert Gatt

TL;DR
This paper introduces a new facial description dataset based on CelebA, evaluates baseline models using transfer learning from VGGFace/ResNet CNNs, and provides benchmarks for future research in face description generation.
Contribution
The paper presents a new facial description dataset and baseline models, advancing the development of human-focused image captioning methods.
Findings
VGGFace-LSTM + Attention model aligns closely with ground truth according to human evaluation.
ResNet-LSTM + Attention model achieves highest CIDEr and CIDEr-D scores.
The dataset and results serve as benchmarks for future face description research.
Abstract
Current image description generation models do not transfer well to the task of describing human faces. To encourage the development of more human-focused descriptions, we developed a new data set of facial descriptions based on the CelebA image data set. We describe the properties of this data set, and present results from a face description generator trained on it, which explores the feasibility of using transfer learning from VGGFace/ResNet CNNs. Comparisons are drawn through both automated metrics and human evaluation by 76 English-speaking participants. The descriptions generated by the VGGFace-LSTM + Attention model are closest to the ground truth according to human evaluation whilst the ResNet-LSTM + Attention model obtained the highest CIDEr and CIDEr-D results (1.252 and 0.686 respectively). Together, the new data set and these experimental results provide data and baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
