Multi-Modal Summary Generation using Multi-Objective Optimization
Anubhav Jangra, Sriparna Saha, Adam Jatowt, Mohammad Hasanuzzaman

TL;DR
This paper introduces a novel extractive multi-modal summarization model that simultaneously optimizes intra-modality salience, cross-modal redundancy, and similarity to generate effective summaries containing text, images, and videos.
Contribution
It presents a new multi-objective optimization framework for multi-modal summarization that outperforms existing state-of-the-art methods across different modalities.
Findings
Model outperforms state-of-the-art approaches.
Effective integration of text, images, and videos.
Simultaneous optimization improves summary quality.
Abstract
Significant development of communication technology over the past few years has motivated research in multi-modal summarization techniques. A majority of the previous works on multi-modal summarization focus on text and images. In this paper, we propose a novel extractive multi-objective optimization based model to produce a multi-modal summary containing text, images, and videos. Important objectives such as intra-modality salience, cross-modal redundancy and cross-modal similarity are optimized simultaneously in a multi-objective optimization framework to produce effective multi-modal output. The proposed model has been evaluated separately for different modalities, and has been found to perform better than state-of-the-art approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
