Annotation Order Matters: Recurrent Image Annotator for Arbitrary Length Image Tagging
Jiren Jin, Hideki Nakayama

TL;DR
This paper introduces a Recurrent Image Annotator model that generates image tags as variable-length sequences, aligning more naturally with human annotation practices and improving flexibility over fixed top-k methods.
Contribution
The paper presents a novel sequence-based image annotation model that predicts variable-length tags and highlights the importance of tag order during training for better performance.
Findings
RIA outperforms traditional fixed top-k methods.
Tag order in training significantly impacts annotation accuracy.
Model serves as a high-quality baseline for arbitrary length image tagging.
Abstract
Automatic image annotation has been an important research topic in facilitating large scale image management and retrieval. Existing methods focus on learning image-tag correlation or correlation between tags to improve annotation accuracy. However, most of these methods evaluate their performance using top-k retrieval performance, where k is fixed. Although such setting gives convenience for comparing different methods, it is not the natural way that humans annotate images. The number of annotated tags should depend on image contents. Inspired by the recent progress in machine translation and image captioning, we propose a novel Recurrent Image Annotator (RIA) model that forms image annotation task as a sequence generation problem so that RIA can natively predict the proper length of tags according to image contents. We evaluate the proposed model on various image annotation datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
