Pre-Trained Language Transformers are Universal Image Classifiers

Rahul Goel; Modar Sulaiman; Kimia Noorbakhsh; Mahdi Sharifi; Rajesh; Sharma; Pooyan Jamshidi; Kallol Roy

arXiv:2201.10182·cs.CV·January 26, 2022·1 cites

Pre-Trained Language Transformers are Universal Image Classifiers

Rahul Goel, Modar Sulaiman, Kimia Noorbakhsh, Mahdi Sharifi, Rajesh, Sharma, Pooyan Jamshidi, Kallol Roy

PDF

Open Access

TL;DR

This paper demonstrates that a pretrained GPT-2 transformer, fine-tuned with frozen layers, can serve as a universal image classifier, effectively classifying facial images and encrypted images with high accuracy, highlighting privacy and bias considerations.

Contribution

It introduces a novel method of using a pretrained text transformer as a universal image classifier, including for encrypted images, leveraging its meta-learning capacity and heavy-tail distribution properties.

Findings

01

High accuracy classification on raw facial images

02

Effective classification of encrypted facial images

03

Potential for privacy-preserving machine learning

Abstract

Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these traits will help to classify the people in different attributes. In this paper, we have presented a novel method for classifying images using a pretrained transformer model. We apply the pretrained transformer for the binary classification of facial images in criminal and non-criminal classes. The pretrained transformer of GPT-2 is trained to generate text and then fine-tuned to classify facial images. During the finetuning process with images, most of the layers of GT-2 are frozen during backpropagation and the model is frozen pretrained transformer (FPT). The FPT acts as a universal image classifier, and this paper shows the application of FPT on facial images. We also use our FPT on encrypted images for classification. Our FPT shows high accuracy on both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Authorship Attribution and Profiling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Discriminative Fine-Tuning · Attention Dropout · Residual Connection · Dropout · Byte Pair Encoding · Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia?