Fusing Multimodal Signals on Hyper-complex Space for Extreme Abstractive   Text Summarization (TL;DR) of Scientific Contents

Yash Kumar Atri; Vikram Goyal; Tanmoy Chakraborty

arXiv:2306.13968·cs.CL·June 27, 2023

Fusing Multimodal Signals on Hyper-complex Space for Extreme Abstractive Text Summarization (TL;DR) of Scientific Contents

Yash Kumar Atri, Vikram Goyal, Tanmoy Chakraborty

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel multimodal dataset and a hyper-complex Transformer model for extreme abstractive summarization of scientific content, leveraging videos, audio, and text to generate concise summaries.

Contribution

The paper presents the first dataset for multimodal extreme abstractive summarization and a novel hyper-complex Transformer model that effectively captures modality interactions in a geometric space.

Findings

01

mTLDRgen outperforms 20 baselines on Rouge scores

02

Generated summaries are fluent and source-congruent

03

Model effectively captures multimodal interactions

Abstract

The realm of scientific text summarization has experienced remarkable progress due to the availability of annotated brief summaries and ample data. However, the utilization of multiple input modalities, such as videos and audio, has yet to be thoroughly explored. At present, scientific multimodal-input-based text summarization systems tend to employ longer target summaries like abstracts, leading to an underwhelming performance in the task of text summarization. In this paper, we deal with a novel task of extreme abstractive text summarization (aka TL;DR generation) by leveraging multiple input modalities. To this end, we introduce mTLDR, a first-of-its-kind dataset for the aforementioned task, comprising videos, audio, and text, along with both author-composed summaries and expert-annotated summaries. The mTLDR dataset accompanies a total of 4,182 instances collected from various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lcs2-iiitd/mtldrgen
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Adam · Byte Pair Encoding