FaceGemma: Enhancing Image Captioning with Facial Attributes for   Portrait Images

Naimul Haque; Iffat Labiba; Sadia Akter

arXiv:2309.13601·cs.CV·July 16, 2024·2 cites

FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images

Naimul Haque, Iffat Labiba, Sadia Akter

PDF

Open Access

TL;DR

FaceGemma is a novel image captioning model that enhances portrait descriptions by integrating facial attributes, resulting in more accurate and detailed captions demonstrated by improved BLEU-1 and METEOR scores.

Contribution

The paper introduces FaceGemma, a new approach that incorporates facial attributes and human-annotated data to improve image captioning for portraits.

Findings

01

Achieved BLEU-1 score of 0.364

02

Achieved METEOR score of 0.355

03

Significant improvement over baseline models

Abstract

Automated image caption generation is essential for improving the accessibility and understanding of visual content. In this study, we introduce FaceGemma, a model that accurately describes facial attributes such as emotions, expressions, and features. Using FaceAttdb data, we generated descriptions for 2000 faces with the Llama 3 - 70B model and fine-tuned the PaliGemma model with these descriptions. Based on the attributes and captions supplied in FaceAttDB, we created a new description dataset where each description perfectly depicts the human-annotated attributes, including key features like attractiveness, full lips, big nose, blond hair, brown hair, bushy eyebrows, eyeglasses, male, smile, and youth. This detailed approach ensures that the generated descriptions are closely aligned with the nuanced visual details present in the images. Our FaceGemma model leverages an innovative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Subtitles and Audiovisual Media · Video Analysis and Summarization

MethodsLLaMA · Focus