Understanding Infographics through Textual and Visual Tag Prediction
Zoya Bylinskii, Sami Alsheikh, Spandan Madan, Adria Recasens, Kimberli, Zhong, Hanspeter Pfister, Fredo Durand, Aude Oliva

TL;DR
This paper proposes a method for automatically identifying key visual and textual elements in infographics to understand their content, using a two-step process involving text extraction and visual localization.
Contribution
It introduces a novel approach for visual hashtag discovery in infographics by leveraging predicted text tags as supervisory signals for visual element localization.
Findings
High accuracy in categorization and multi-label tag prediction
Visual hashtags closely match human annotations
Effective use of text tags for visual element localization
Abstract
We introduce the problem of visual hashtag discovery for infographics: extracting visual elements from an infographic that are diagnostic of its topic. Given an infographic as input, our computational approach automatically outputs textual and visual elements predicted to be representative of the infographic content. Concretely, from a curated dataset of 29K large infographic images sampled across 26 categories and 391 tags, we present an automated two step approach. First, we extract the text from an infographic and use it to predict text tags indicative of the infographic content. And second, we use these predicted text tags as a supervisory signal to localize the most diagnostic visual elements from within the infographic i.e. visual hashtags. We report performances on a categorization and multi-label tag prediction problem and compare our proposed visual hashtags to human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
