ACORT: A Compact Object Relation Transformer for Parameter Efficient   Image Captioning

Jia Huei Tan; Ying Hua Tan; Chee Seng Chan; Joon Huang Chuah

arXiv:2202.05451·cs.CV·February 14, 2022

ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning

Jia Huei Tan, Ying Hua Tan, Chee Seng Chan, Joon Huang Chuah

PDF

Open Access 1 Repo

TL;DR

ACORT introduces three parameter reduction techniques for Transformer-based image captioning, achieving significantly smaller models that maintain competitive performance on MS-COCO, thus enabling more efficient image captioning systems.

Contribution

The paper proposes a novel combination of parameter reduction methods—Radix Encoding, cross-layer, and attention sharing—for Transformer models in image captioning.

Findings

01

Models are 3.7x to 21.6x smaller than baselines.

02

Achieve CIDEr scores >=126 on MS-COCO.

03

Maintain competitive performance despite parameter reduction.

Abstract

Recent research that applies Transformer-based architectures to image captioning has resulted in state-of-the-art image captioning performance, capitalising on the success of Transformers on natural language tasks. Unfortunately, though these models work well, one major flaw is their large model sizes. To this end, we present three parameter reduction methods for image captioning Transformers: Radix Encoding, cross-layer parameter sharing, and attention parameter sharing. By combining these methods, our proposed ACORT models have 3.7x to 21.6x fewer parameters than the baseline model without compromising test performance. Results on the MS-COCO dataset demonstrate that our ACORT models are competitive against baselines and SOTA approaches, with CIDEr score >=126. Finally, we present qualitative results and ablation studies to demonstrate the efficacy of the proposed changes further.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiahuei/sparse-image-captioning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning