Dependent Multi-Task Learning with Causal Intervention for Image   Captioning

Wenqing Chen; Jidong Tian; Caoyun Fan; Hao He; and Yaohui Jin

arXiv:2105.08573·cs.LG·May 19, 2021·1 cites

Dependent Multi-Task Learning with Causal Intervention for Image Captioning

Wenqing Chen, Jidong Tian, Caoyun Fan, Hao He, and Yaohui Jin

PDF

Open Access

TL;DR

This paper introduces a causal intervention framework with multi-task learning for image captioning, addressing content inconsistency and informativeness issues by leveraging intermediate tasks and Pearl's do-calculus.

Contribution

It proposes a novel dependent multi-task learning framework with causal intervention (DMTCI) that improves caption quality by reducing spurious correlations and enhancing visual understanding.

Findings

01

Outperforms baseline models on standard benchmarks.

02

Achieves competitive results with state-of-the-art methods.

03

Effectively reduces content inconsistency in generated captions.

Abstract

Recent work for image captioning mainly followed an extract-then-generate paradigm, pre-extracting a sequence of object-based features and then formulating image captioning as a single sequence-to-sequence task. Although promising, we observed two problems in generated captions: 1) content inconsistency where models would generate contradicting facts; 2) not informative enough where models would miss parts of important information. From a causal perspective, the reason is that models have captured spurious statistical correlations between visual features and certain expressions (e.g., visual features of "long hair" and "woman"). In this paper, we propose a dependent multi-task learning framework with the causal intervention (DMTCI). Firstly, we involve an intermediate task, bag-of-categories generation, before the final task, image captioning. The intermediate task would help the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning