Event Specific Multimodal Pattern Mining with Image-Caption Pairs
Hongzhi Li, Joseph G. Ellis, Shih-Fu Chang

TL;DR
This paper introduces a multimodal pattern mining framework that leverages image-caption pairs from news events to discover semantically meaningful image patches with human-recognizable names, outperforming vision-only methods.
Contribution
The authors propose a novel multimodal pattern mining approach that uses captions to learn and name image patch patterns, specifically tailored for news event images, with a new evaluation framework.
Findings
Patterns are 26.2% more semantically meaningful than vision-only methods.
Achieves 54.5% accuracy in tagging image patches without supervision.
Discovers named patterns beyond existing datasets like ImageNet.
Abstract
In this paper we describe a novel framework and algorithms for discovering image patch patterns from a large corpus of weakly supervised image-caption pairs generated from news events. Current pattern mining techniques attempt to find patterns that are representative and discriminative, we stipulate that our discovered patterns must also be recognizable by humans and preferably with meaningful names. We propose a new multimodal pattern mining approach that leverages the descriptive captions often accompanying news images to learn semantically meaningful image patch patterns. The mutltimodal patterns are then named using words mined from the associated image captions for each pattern. A novel evaluation framework is provided that demonstrates our patterns are 26.2% more semantically meaningful than those discovered by the state of the art vision only pipeline, and that we can provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Image and Video Retrieval Techniques · Text and Document Classification Technologies
