Order-Free RNN with Visual Attention for Multi-Label Classification
Shang-Fu Chen, Yi-Chen Chen, Chih-Kuan Yeh, Yu-Chiang Frank Wang

TL;DR
This paper introduces an order-free RNN with visual attention for multi-label classification, enabling robust, sequence-independent predictions and better object identification in images.
Contribution
It presents a novel joint attention and LSTM model that handles unordered multi-label classification without pre-defined label sequences.
Findings
Addresses label order dependency in multi-label classification.
Effectively identifies objects of varying sizes without prior label order.
Utilizes beam search for efficient multi-label prediction.
Abstract
In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Machine Learning in Bioinformatics · Image Retrieval and Classification Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
