PETA: Photo Albums Event Recognition using Transformers Attention
Tamar Glaser, Emanuel Ben-Baruch, Gilad Sharir, Nadav Zamir, Asaf Noy,, Lihi Zelnik-Manor

TL;DR
This paper introduces PETA, a transformer-based approach for recognizing events in personal photo albums, effectively handling disordered images and achieving state-of-the-art results on multiple benchmarks.
Contribution
It combines CNNs and transformers for global reasoning in photo albums, addressing limitations of previous temporal models and exploring image importance correlation.
Findings
Achieves above 90% mAP on all tested datasets.
Outperforms previous methods on three benchmarks.
Demonstrates correlation between learned attention and human-annotated importance.
Abstract
In recent years the amounts of personal photos captured increased significantly, giving rise to new challenges in multi-image understanding and high-level image understanding. Event recognition in personal photo albums presents one challenging scenario where life events are recognized from a disordered collection of images, including both relevant and irrelevant images. Event recognition in images also presents the challenge of high-level image understanding, as opposed to low-level image object classification. In absence of methods to analyze multiple inputs, previous methods adopted temporal mechanisms, including various forms of recurrent neural networks. However, their effective temporal window is local. In addition, they are not a natural choice given the disordered characteristic of photo albums. We address this gap with a tailor-made solution, combining the power of CNNs for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Domain Adaptation and Few-Shot Learning
