Towards Fine-grained Image Classification with Generative Adversarial Networks and Facial Landmark Detection
Mahdi Darvish, Mahsa Pouramini, Hamid Bahador

TL;DR
This paper enhances fine-grained image classification by using facial landmark detection and improved GAN-based data augmentation with StyleGAN2-ADA, boosting the performance of Vision Transformer models on the Oxford-IIIT Pets dataset.
Contribution
It introduces a novel approach combining facial landmark detection with advanced GAN augmentation to improve fine-grained classification accuracy.
Findings
GAN-based augmentation improves classification accuracy
Facial landmark cropping enhances image realism
Method outperforms standard augmentation techniques
Abstract
Fine-grained classification remains a challenging task because distinguishing categories needs learning complex and local differences. Diversity in the pose, scale, and position of objects in an image makes the problem even more difficult. Although the recent Vision Transformer models achieve high performance, they need an extensive volume of input data. To encounter this problem, we made the best use of GAN-based data augmentation to generate extra dataset instances. Oxford-IIIT Pets was our dataset of choice for this experiment. It consists of 37 breeds of cats and dogs with variations in scale, poses, and lighting, which intensifies the difficulty of the classification task. Furthermore, we enhanced the performance of the recent Generative Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic images while preventing overfitting to the training set. We did this by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Image Processing Techniques and Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Residual Connection · Layer Normalization · Dropout · Softmax · Depthwise Convolution
