VIP: Finding Important People in Images
Clint Solomon Mathialagan, Andrew C. Gallagher, Dhruv Batra

TL;DR
This paper introduces a method to automatically determine the importance of individuals in group photos using visual cues, improving image captioning and related applications.
Contribution
It proposes a new importance measure for people in images and demonstrates its effectiveness in predicting importance and enhancing image captioning tasks.
Findings
Automatically predicting importance improves captioning accuracy
Importance correlates with visual saliency but is not identical
Incorporating importance enhances image description applications
Abstract
People preserve memories of events such as birthdays, weddings, or vacations by capturing photos, often depicting groups of people. Invariably, some individuals in the image are more important than others given the context of the event. This paper analyzes the concept of the importance of individuals in group photographs. We address two specific questions -- Given an image, who are the most important individuals in it? Given multiple images of a person, which image depicts the person in the most important role? We introduce a measure of importance of people in images and investigate the correlation between importance and visual saliency. We find that not only can we automatically predict the importance of people from purely visual cues, incorporating this predicted importance results in significant improvement in applications such as im2text (generating sentences that describe images of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
