CAPEEN: Image Captioning with Early Exits and Knowledge Distillation
Divya Jyoti Bajpai, Manjesh Kumar Hanawal

TL;DR
CAPEEN enhances image captioning efficiency by integrating early exit strategies with knowledge distillation, enabling faster inference with maintained accuracy, and introduces an adaptive variant for real-world robustness.
Contribution
The paper proposes CAPEEN, a novel method combining early exits and knowledge distillation for efficient image captioning, and introduces A-CAPEEN for adaptive thresholding in deployment scenarios.
Findings
CAPEEN achieves 1.77x speedup with competitive accuracy.
A-CAPEEN improves robustness against data distortions.
Method maintains performance with reduced inference latency.
Abstract
Deep neural networks (DNNs) have made significant progress in recognizing visual elements and generating descriptive text in image-captioning tasks. However, their improved performance comes from increased computational burden and inference latency. Early Exit (EE) strategies can be used to enhance their efficiency, but their adaptation presents challenges in image captioning as it requires varying levels of semantic information for accurate predictions. To overcome this, we introduce CAPEEN to improve the performance of EE strategies using knowledge distillation. Inference in CAPEEN is completed at intermediary layers if prediction confidence exceeds a predefined value learned from the training data. To account for real-world deployments, where target distributions could drift from that of training samples, we introduce a variant A-CAPEEN to adapt the thresholds on the fly using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
