# COMIC: Towards A Compact Image Captioning Model with Attention

**Authors:** Jia Huei Tan, Chee Seng Chan, Joon Huang Chuah

arXiv: 1903.01072 · 2019-06-13

## TL;DR

COMIC introduces a compact image captioning model that maintains high performance while significantly reducing vocabulary size, making it suitable for embedded systems.

## Contribution

The paper proposes a novel compact image captioning model, COMIC, which achieves comparable results to state-of-the-art methods with a much smaller vocabulary size.

## Key findings

- Achieves similar performance to state-of-the-art models on MS-COCO and InstaPIC-1.1M datasets.
- Vocabulary size is reduced by 39x to 99x without sacrificing accuracy.
- Demonstrates the feasibility of deploying image captioning models on resource-limited devices.

## Abstract

Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder-decoder style networks with attention do not scale naturally to large vocabulary size, making them difficult to be deployed on embedded system with limited hardware resources. This is because the size of word and output embedding matrices grow proportionally with the size of vocabulary, adversely affecting the compactness of these networks. To address this limitation, this paper introduces a brand new idea in the domain of image captioning. That is, we tackle the problem of compactness of image captioning models which is hitherto unexplored. We showed that, our proposed model, named COMIC for COMpact Image Captioning, achieves comparable results in five common evaluation metrics with state-of-the-art approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an embedding vocabulary size that is 39x - 99x smaller. The source code and models are available at: https://github.com/jiahuei/COMIC-Compact-Image-Captioning-with-Attention

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.01072/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1903.01072/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/1903.01072/full.md

---
Source: https://tomesphere.com/paper/1903.01072