Hierarchical Cross-Modality Semantic Correlation Learning Model for   Multimodal Summarization

Litian Zhang; Xiaoming Zhang; Junshu Pan; Feiran Huang

arXiv:2112.12072·cs.CV·December 23, 2021·1 cites

Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization

Litian Zhang, Xiaoming Zhang, Junshu Pan, Feiran Huang

PDF

Open Access 1 Video

TL;DR

This paper introduces a hierarchical cross-modality semantic correlation learning model (HCSCL) for multimodal summarization, effectively capturing hierarchical and intra/inter-modal correlations to improve summary quality.

Contribution

The paper proposes a novel HCSCL model that encodes intra-modal and hierarchical cross-modal correlations using graph networks and a hierarchical fusion framework.

Findings

01

HCSCL outperforms baseline methods in automatic metrics.

02

The model achieves higher diversity in generated summaries.

03

Extensive experiments validate the effectiveness of the approach.

Abstract

Multimodal summarization with multimodal output (MSMO) generates a summary with both textual and visual content. Multimodal news report contains heterogeneous contents, which makes MSMO nontrivial. Moreover, it is observed that different modalities of data in the news report correlate hierarchically. Traditional MSMO methods indistinguishably handle different modalities of data by learning a representation for the whole data, which is not directly adaptable to the heterogeneous contents and hierarchical correlation. In this paper, we propose a hierarchical cross-modality semantic correlation learning model (HCSCL) to learn the intra- and inter-modal correlation existing in the multimodal data. HCSCL adopts a graph network to encode the intra-modal correlation. Then, a hierarchical fusion framework is proposed to learn the hierarchical correlation between text and images. Furthermore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization· underline

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Text and Document Classification Technologies