Self-Supervised Multimodal Opinion Summarization

Jinbae Im; Moonki Kim; Hoyeop Lee; Hyunsouk Cho; Sehee Chung

arXiv:2105.13135·cs.CL·May 28, 2021

Self-Supervised Multimodal Opinion Summarization

Jinbae Im, Moonki Kim, Hoyeop Lee, Hyunsouk Cho, Sehee Chung

PDF

Open Access 1 Repo

TL;DR

This paper introduces MultimodalSum, a self-supervised framework that leverages both text and non-text review data, including images and metadata, to generate more comprehensive opinion summaries.

Contribution

It proposes a novel multimodal training pipeline with separate encoders for each modality and end-to-end fusion, enhancing opinion summarization with non-text data.

Findings

01

MultimodalSum outperforms text-only models on Yelp and Amazon datasets.

02

Pretraining on individual modalities improves overall summarization quality.

03

Incorporating non-text data significantly enhances summary informativeness.

Abstract

Recently, opinion summarization, which is the generation of a summary from multiple reviews, has been conducted in a self-supervised manner by considering a sampled review as a pseudo summary. However, non-text data such as image and metadata related to reviews have been considered less often. To use the abundant information contained in non-text data, we propose a self-supervised multimodal opinion summarization framework called MultimodalSum. Our framework obtains a representation of each modality using a separate encoder for each modality, and the text decoder generates a summary. To resolve the inherent heterogeneity of multimodal data, we propose a multimodal training pipeline. We first pretrain the text encoder--decoder based solely on text modality data. Subsequently, we pretrain the non-text modality encoders by considering the pretrained text decoder as a pivot for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nc-ai/MultimodalSum
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques