Experimenting with Self-Supervision using Rotation Prediction for Image Captioning
Ahmed Elhagry, Karima Kadaoui

TL;DR
This paper explores a self-supervised approach for image captioning by training a CNN encoder on image rotation prediction, aiming to reduce reliance on manual annotations while maintaining captioning quality.
Contribution
It introduces a novel self-supervised training method for image feature extraction using rotation prediction within an image captioning framework.
Findings
Self-supervised CNN encoder achieves competitive image feature learning.
The approach reduces dependence on labeled data for captioning.
Preliminary results show promising caption quality with self-supervised features.
Abstract
Image captioning is a task in the field of Artificial Intelligence that merges between computer vision and natural language processing. It is responsible for generating legends that describe images, and has various applications like descriptions used by assistive technology or indexing images (for search engines for instance). This makes it a crucial topic in AI that is undergoing a lot of research. This task however, like many others, is trained on large images labeled via human annotation, which can be very cumbersome: it needs manual effort, both financial and temporal costs, it is error-prone and potentially difficult to execute in some cases (e.g. medical images). To mitigate the need for labels, we attempt to use self-supervised learning, a type of learning where models use the data contained within the images themselves as labels. It is challenging to accomplish though, since the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
