Compressed Image Captioning using CNN-based Encoder-Decoder Framework

Md Alif Rahman Ridoy; M Mahmud Hasan; Shovon Bhowmick

arXiv:2404.18062·cs.CV·April 30, 2024·2 cites

Compressed Image Captioning using CNN-based Encoder-Decoder Framework

Md Alif Rahman Ridoy, M Mahmud Hasan, Shovon Bhowmick

PDF

Open Access

TL;DR

This paper presents a CNN-based encoder-decoder framework for image captioning, exploring model compression techniques to improve efficiency while maintaining captioning performance.

Contribution

It introduces a novel combination of CNN feature extraction with encoder-decoder models and investigates model compression for resource-efficient captioning.

Findings

01

Pre-trained CNN models vary in captioning performance.

02

Frequency regularization can effectively compress CNN models.

03

Compressed models retain comparable captioning accuracy.

Abstract

In today's world, image processing plays a crucial role across various fields, from scientific research to industrial applications. But one particularly exciting application is image captioning. The potential impact of effective image captioning is vast. It can significantly boost the accuracy of search engines, making it easier to find relevant information. Moreover, it can greatly enhance accessibility for visually impaired individuals, providing them with a more immersive experience of digital content. However, despite its promise, image captioning presents several challenges. One major hurdle is extracting meaningful visual information from images and transforming it into coherent language. This requires bridging the gap between the visual and linguistic domains, a task that demands sophisticated algorithms and models. Our project is focused on addressing these challenges by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Video Analysis and Summarization · Multimodal Machine Learning Applications