Multi-Modal Summary Generation using Multi-Objective Optimization

Anubhav Jangra; Sriparna Saha; Adam Jatowt; Mohammad Hasanuzzaman

arXiv:2005.09252·cs.IR·May 20, 2020

Multi-Modal Summary Generation using Multi-Objective Optimization

Anubhav Jangra, Sriparna Saha, Adam Jatowt, Mohammad Hasanuzzaman

PDF

TL;DR

This paper introduces a novel extractive multi-modal summarization model that simultaneously optimizes intra-modality salience, cross-modal redundancy, and similarity to generate effective summaries containing text, images, and videos.

Contribution

It presents a new multi-objective optimization framework for multi-modal summarization that outperforms existing state-of-the-art methods across different modalities.

Findings

01

Model outperforms state-of-the-art approaches.

02

Effective integration of text, images, and videos.

03

Simultaneous optimization improves summary quality.

Abstract

Significant development of communication technology over the past few years has motivated research in multi-modal summarization techniques. A majority of the previous works on multi-modal summarization focus on text and images. In this paper, we propose a novel extractive multi-objective optimization based model to produce a multi-modal summary containing text, images, and videos. Important objectives such as intra-modality salience, cross-modal redundancy and cross-modal similarity are optimized simultaneously in a multi-objective optimization framework to produce effective multi-modal output. The proposed model has been evaluated separately for different modalities, and has been found to perform better than state-of-the-art approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.