The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels   in Comic Book Narratives

Mohit Iyyer; Varun Manjunatha; Anupam Guha; Yogarshi Vyas; Jordan; Boyd-Graber; Hal Daum\'e III; Larry Davis

arXiv:1611.05118·cs.CV·May 9, 2017·1 cites

The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives

Mohit Iyyer, Varun Manjunatha, Anupam Guha, Yogarshi Vyas, Jordan, Boyd-Graber, Hal Daum\'e III, Larry Davis

PDF

Open Access 3 Repos

TL;DR

This paper investigates whether AI can understand comic book narratives by analyzing a large dataset of panels and introducing tasks that require integrating visual and textual information to infer story continuity.

Contribution

The paper introduces the COMICS dataset and proposes new multimodal tasks to evaluate AI understanding of comic narratives, highlighting current limitations.

Findings

01

Models underperform humans on narrative prediction tasks

02

Both text and image modalities are essential for understanding comics

03

COMICS dataset reveals fundamental challenges in multimodal narrative comprehension

Abstract

Visual narrative is often a combination of explicit information and judicious omissions, relying on the viewer to supply missing details. In comics, most movements in time and space are hidden in the "gutters" between panels. To follow the story, readers logically connect panels together by inferring unseen actions through a process called "closure". While computers can now describe what is explicitly depicted in natural images, in this paper we examine whether they can understand the closure-driven narratives conveyed by stylized artwork and dialogue in comic book panels. We construct a dataset, COMICS, that consists of over 1.2 million panels (120 GB) paired with automatic textbox transcriptions. An in-depth analysis of COMICS demonstrates that neither text nor image alone can tell a comic book story, so a computer must understand both modalities to keep up with the plot. We introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Digital Storytelling and Education