Pre-Trained Language Transformers are Universal Image Classifiers
Rahul Goel, Modar Sulaiman, Kimia Noorbakhsh, Mahdi Sharifi, Rajesh, Sharma, Pooyan Jamshidi, Kallol Roy

TL;DR
This paper demonstrates that a pretrained GPT-2 transformer, fine-tuned with frozen layers, can serve as a universal image classifier, effectively classifying facial images and encrypted images with high accuracy, highlighting privacy and bias considerations.
Contribution
It introduces a novel method of using a pretrained text transformer as a universal image classifier, including for encrypted images, leveraging its meta-learning capacity and heavy-tail distribution properties.
Findings
High accuracy classification on raw facial images
Effective classification of encrypted facial images
Potential for privacy-preserving machine learning
Abstract
Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these traits will help to classify the people in different attributes. In this paper, we have presented a novel method for classifying images using a pretrained transformer model. We apply the pretrained transformer for the binary classification of facial images in criminal and non-criminal classes. The pretrained transformer of GPT-2 is trained to generate text and then fine-tuned to classify facial images. During the finetuning process with images, most of the layers of GT-2 are frozen during backpropagation and the model is frozen pretrained transformer (FPT). The FPT acts as a universal image classifier, and this paper shows the application of FPT on facial images. We also use our FPT on encrypted images for classification. Our FPT shows high accuracy on both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Authorship Attribution and Profiling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Discriminative Fine-Tuning · Attention Dropout · Residual Connection · Dropout · Byte Pair Encoding · Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia?
