Paraphrasing Is All You Need for Novel Object Captioning
Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Ruslan, Salakhutdinov, Louis-Philippe Morency, Yu-Chiang Frank Wang

TL;DR
This paper introduces P2C, a two-stage paraphrasing-based framework for novel object captioning that improves caption quality without ground truth annotations by leveraging language models and cross-modality associations.
Contribution
The paper proposes a novel paraphrasing-based learning framework for NOC that enhances caption fluency and content accuracy without requiring annotated captions for novel objects.
Findings
Achieves state-of-the-art results on nocaps and COCO Caption datasets.
Demonstrates the effectiveness of paraphrasing and cross-modality modules in NOC.
Shows flexibility by replacing language and association models in the framework.
Abstract
Novel object captioning (NOC) aims to describe images containing objects without observing their ground truth captions during training. Due to the absence of caption annotation, captioning models cannot be directly optimized via sequence-to-sequence training or CIDEr optimization. As a result, we present Paraphrasing-to-Captioning (P2C), a two-stage learning framework for NOC, which would heuristically optimize the output captions via paraphrasing. With P2C, the captioning model first learns paraphrasing from a language model pre-trained on text-only corpus, allowing expansion of the word bank for improving linguistic fluency. To further enforce the output caption sufficiently describing the visual content of the input image, we perform self-paraphrasing for the captioning model with fidelity and adequacy objectives introduced. Since no ground truth captions are available for novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
