Summary-Oriented Vision Modeling for Multimodal Abstractive   Summarization

Yunlong Liang; Fandong Meng; Jinan Xu; Jiaan Wang; Yufeng Chen; Jie; Zhou

arXiv:2212.07672·cs.CV·May 5, 2023·1 cites

Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization

Yunlong Liang, Fandong Meng, Jinan Xu, Jiaan Wang, Yufeng Chen, Jie, Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach for multimodal abstractive summarization that emphasizes summary-oriented visual features, utilizing auxiliary tasks to enhance performance across diverse resource scenarios and establishing a new multilingual dataset.

Contribution

It proposes a new training framework with auxiliary tasks to capture summary-oriented visual features, improving MAS performance especially in low-resource settings.

Findings

01

Achieves state-of-the-art results across 44 languages.

02

Effective in low- and zero-resource scenarios.

03

Provides a large-scale multilingual multimodal dataset.

Abstract

Multimodal abstractive summarization (MAS) aims to produce a concise summary given the multimodal data (text and vision). Existing studies mainly focus on how to effectively use the visual features from the perspective of an article, having achieved impressive success on the high-resource English dataset. However, less attention has been paid to the visual features from the perspective of the summary, which may limit the model performance, especially in the low- and zero-resource scenarios. In this paper, we propose to improve the summary quality through summary-oriented visual features. To this end, we devise two auxiliary tasks including vision to summary task and masked image modeling task. Together with the main summarization task, we optimize the MAS model via the training objectives of all these tasks. By these means, the MAS model can be enhanced by capturing the summary-oriented…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xl2248/sov-mas
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies

MethodsMixing Adam and SGD