Facial Expression Recognition and Image Description Generation in Vietnamese
Khang Nhut Lam, Kim-Ngoc Thi Nguyen, Loc Huu Nguy, and Jugal Kalita

TL;DR
This paper presents a multi-model approach for facial expression recognition and image description generation in Vietnamese, combining YOLOv5, CNN, VGG16, and LSTM to produce descriptive sentences with improved accuracy.
Contribution
It introduces a novel integrated system that combines facial expression recognition and image captioning tailored for Vietnamese, utilizing YOLOv5 and a merged architecture for enhanced performance.
Findings
YOLOv5 outperforms traditional CNN in emotion recognition with 0.938 accuracy
The image description model achieves BLEU scores up to 0.628 for BLEU-1
Combining models improves the accuracy of visual and emotional content description.
Abstract
This paper discusses a facial expression recognition model and a description generation model to build descriptive sentences for images and facial expressions of people in images. Our study shows that YOLOv5 achieves better results than a traditional CNN for all emotions on the KDEF dataset. In particular, the accuracies of the CNN and YOLOv5 models for emotion recognition are 0.853 and 0.938, respectively. A model for generating descriptions for images based on a merged architecture is proposed using VGG16 with the descriptions encoded over an LSTM model. YOLOv5 is also used to recognize dominant colors of objects in the images and correct the color words in the descriptions generated if it is necessary. If the description contains words referring to a person, we recognize the emotion of the person in the image. Finally, we combine the results of all models to create sentences that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
