FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images
Naimul Haque, Iffat Labiba, Sadia Akter

TL;DR
FaceGemma is a novel image captioning model that enhances portrait descriptions by integrating facial attributes, resulting in more accurate and detailed captions demonstrated by improved BLEU-1 and METEOR scores.
Contribution
The paper introduces FaceGemma, a new approach that incorporates facial attributes and human-annotated data to improve image captioning for portraits.
Findings
Achieved BLEU-1 score of 0.364
Achieved METEOR score of 0.355
Significant improvement over baseline models
Abstract
Automated image caption generation is essential for improving the accessibility and understanding of visual content. In this study, we introduce FaceGemma, a model that accurately describes facial attributes such as emotions, expressions, and features. Using FaceAttdb data, we generated descriptions for 2000 faces with the Llama 3 - 70B model and fine-tuned the PaliGemma model with these descriptions. Based on the attributes and captions supplied in FaceAttDB, we created a new description dataset where each description perfectly depicts the human-annotated attributes, including key features like attractiveness, full lips, big nose, blond hair, brown hair, bushy eyebrows, eyeglasses, male, smile, and youth. This detailed approach ensures that the generated descriptions are closely aligned with the nuanced visual details present in the images. Our FaceGemma model leverages an innovative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Video Analysis and Summarization
MethodsLLaMA · Focus
