Dense Multitask Learning to Reconfigure Comics

Deblina Bhattacharjee; Sabine S\"usstrunk; Mathieu Salzmann

arXiv:2307.08071·cs.CV·July 18, 2023

Dense Multitask Learning to Reconfigure Comics

Deblina Bhattacharjee, Sabine S\"usstrunk, Mathieu Salzmann

PDF

Open Access

TL;DR

This paper introduces a multi-task learning model that performs dense predictions on comic panels, enabling automated reconfiguration and transfer of comics across publication channels despite artistic diversity and limited annotations.

Contribution

The paper presents a novel multi-task learning approach with a vision transformer backbone for dense prediction in comics, leveraging unsupervised translation to overcome annotation scarcity.

Findings

01

Successfully identifies semantic units and 3D notions in comic panels

02

Enables reconfiguration of comics through integration with retargeting methods

03

Demonstrates domain transferability across diverse artistic styles

Abstract

In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publication channel to another by assisting authors in the task of reconfiguring their narratives. Our MTL method can successfully identify the semantic units as well as the embedded notion of 3D in comic panels. This is a significantly challenging problem because comics comprise disparate artistic styles, illustrations, layouts, and object scales that depend on the authors creative process. Typically, dense image-based prediction techniques require a large corpus of data. Finding an automated solution for dense prediction in the comics domain, therefore, becomes more difficult with the lack of ground-truth dense annotations for the comics images. To address these challenges, we develop the following solutions: 1) we leverage a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComics and Graphic Narratives · Digital Storytelling and Education · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Residual Connection · Layer Normalization · Dense Connections · Vision Transformer