Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

Jing Jie Tan; Anissa Mokraoui; Ban-Hoe Kwan; Danny Wee-Kiat Ng; Yan-Chai Hum

arXiv:2512.08873·cs.CV·December 10, 2025

Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

Jing Jie Tan, Anissa Mokraoui, Ban-Hoe Kwan, Danny Wee-Kiat Ng, Yan-Chai Hum

PDF

Open Access

TL;DR

This paper introduces SOLI, a Siamese network-based method that improves low-resolution image captioning efficiency and accuracy, especially suited for resource-constrained environments, by optimizing latent embeddings without heavy models.

Contribution

The paper presents a novel Siamese-driven approach tailored for low-resolution image captioning, reducing computational costs while maintaining high performance.

Findings

01

Enhanced captioning accuracy for low-resolution images.

02

Reduced computational overhead compared to transformer-based models.

03

Effective in resource-constrained scenarios.

Abstract

Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a recent challenge in this domain is dealing with low-resolution image (LRI). While performance can be improved by using larger models like transformers for encoding, these models are typically heavyweight, demanding significant computational resources and memory, leading to challenges in retraining. To address this, the proposed SOLI (Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning) approach presents a solution specifically designed for lightweight, low-resolution images captioning. It employs a Siamese network architecture to optimize latent embeddings, enhancing the efficiency and accuracy of the image-to-text translation process. By focusing on a dual-pathway…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling