Event Recognition with Automatic Album Detection based on Sequential   Processing, Neural Attention and Image Captioning

Andrey V. Savchenko

arXiv:1911.11010·cs.CV·January 16, 2020·1 cites

Event Recognition with Automatic Album Detection based on Sequential Processing, Neural Attention and Image Captioning

Andrey V. Savchenko

PDF

Open Access

TL;DR

This paper introduces a novel two-stage event recognition method that groups photos into albums using sequential features and neural attention, enhanced by image captioning, achieving higher accuracy than traditional approaches.

Contribution

The paper proposes a new approach combining sequential clustering, neural attention, and image captioning for event recognition in unlabeled photo albums, outperforming existing methods.

Findings

01

Achieves 9-20% higher accuracy than single-photo event recognition.

02

Reduces error rate by 13-16% compared to hierarchical clustering.

03

Image captions trained on Conceptual Captions improve classification accuracy.

Abstract

In this paper a new formulation of event recognition task is examined: it is required to predict event categories in a gallery of images, for which albums (groups of photos corresponding to a single event) are unknown. We propose the novel two-stage approach. At first, features are extracted in each photo using the pre-trained convolutional neural network. These features are classified individually. The scores of the classifier are used to group sequential photos into several clusters. Finally, the features of photos in each group are aggregated into a single descriptor using neural attention mechanism. This algorithm is optionally extended to improve the accuracy for classification of each image in an album. In contrast to conventional fine-tuning of convolutional neural networks (CNN) we proposed to use image captioning, i.e., generative model that converts images to textual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization