Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning
Jing Jie Tan, Anissa Mokraoui, Ban-Hoe Kwan, Danny Wee-Kiat Ng, Yan-Chai Hum

TL;DR
This paper introduces SOLI, a Siamese network-based method that improves low-resolution image captioning efficiency and accuracy, especially suited for resource-constrained environments, by optimizing latent embeddings without heavy models.
Contribution
The paper presents a novel Siamese-driven approach tailored for low-resolution image captioning, reducing computational costs while maintaining high performance.
Findings
Enhanced captioning accuracy for low-resolution images.
Reduced computational overhead compared to transformer-based models.
Effective in resource-constrained scenarios.
Abstract
Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a recent challenge in this domain is dealing with low-resolution image (LRI). While performance can be improved by using larger models like transformers for encoding, these models are typically heavyweight, demanding significant computational resources and memory, leading to challenges in retraining. To address this, the proposed SOLI (Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning) approach presents a solution specifically designed for lightweight, low-resolution images captioning. It employs a Siamese network architecture to optimize latent embeddings, enhancing the efficiency and accuracy of the image-to-text translation process. By focusing on a dual-pathway…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling
