Dense Multitask Learning to Reconfigure Comics
Deblina Bhattacharjee, Sabine S\"usstrunk, Mathieu Salzmann

TL;DR
This paper introduces a multi-task learning model that performs dense predictions on comic panels, enabling automated reconfiguration and transfer of comics across publication channels despite artistic diversity and limited annotations.
Contribution
The paper presents a novel multi-task learning approach with a vision transformer backbone for dense prediction in comics, leveraging unsupervised translation to overcome annotation scarcity.
Findings
Successfully identifies semantic units and 3D notions in comic panels
Enables reconfiguration of comics through integration with retargeting methods
Demonstrates domain transferability across diverse artistic styles
Abstract
In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publication channel to another by assisting authors in the task of reconfiguring their narratives. Our MTL method can successfully identify the semantic units as well as the embedded notion of 3D in comic panels. This is a significantly challenging problem because comics comprise disparate artistic styles, illustrations, layouts, and object scales that depend on the authors creative process. Typically, dense image-based prediction techniques require a large corpus of data. Finding an automated solution for dense prediction in the comics domain, therefore, becomes more difficult with the lack of ground-truth dense annotations for the comics images. To address these challenges, we develop the following solutions: 1) we leverage a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComics and Graphic Narratives · Digital Storytelling and Education · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Residual Connection · Layer Normalization · Dense Connections · Vision Transformer
