PostDoc: Generating Poster from a Long Multimodal Document Using Deep   Submodular Optimization

Vijay Jaisankar; Sambaran Bandyopadhyay; Kalp Vyas; Varre Chaitanya,; Shwetha Somasundaram

arXiv:2405.20213·cs.AI·May 31, 2024

PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization

Vijay Jaisankar, Sambaran Bandyopadhyay, Kalp Vyas, Varre Chaitanya,, Shwetha Somasundaram

PDF

Open Access

TL;DR

This paper introduces a novel deep submodular function for extracting multimodal content from long documents to automatically generate well-designed posters, combining content summarization, template creation, and design harmonization.

Contribution

It presents a new deep submodular optimization method trained on ground truth summaries for multimodal content extraction and a template generation approach conditioned on content.

Findings

01

Outperforms existing methods in automated evaluations

02

Receives positive human evaluation results

03

Effectively balances coverage, diversity, and alignment

Abstract

A poster from a long input document can be considered as a one-page easy-to-read multimodal (text and images) summary presented on a nice template with good design elements. Automatic transformation of a long document into a poster is a very less studied but challenging task. It involves content summarization of the input document followed by template generation and harmonization. In this work, we propose a novel deep submodular function which can be trained on ground truth summaries to extract multimodal content from the document and explicitly ensures good coverage, diversity and alignment of text and images. Then, we use an LLM based paraphraser and propose to generate a template with various design aspects conditioned on the input content. We show the merits of our approach through extensive automated and human evaluations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems