Deep Active Object Recognition by Joint Label and Action Prediction

Mohsen Malmir; Karan Sikka; Deborah Forster; Ian Fasel; Javier R.; Movellan; Garrison W. Cottrell

arXiv:1512.05484·cs.AI·December 18, 2015

Deep Active Object Recognition by Joint Label and Action Prediction

Mohsen Malmir, Karan Sikka, Deborah Forster, Ian Fasel, Javier R., Movellan, Garrison W. Cottrell

PDF

TL;DR

This paper introduces a deep neural network for active object recognition that jointly predicts object labels and selects actions to improve recognition accuracy, using reinforcement learning and a Dirichlet-based state encoding.

Contribution

It presents a novel joint prediction framework combining label and action prediction with a Dirichlet-based state encoding for active recognition.

Findings

01

The proposed model outperforms Dirichlet and Naive Bayes encodings in accuracy.

02

Joint training improves both action selection and label prediction.

03

Dirichlet encoding enhances the system's performance on the GERMS dataset.

Abstract

An active object recognition system has the advantage of being able to act in the environment to capture images that are more suited for training and that lead to better performance at test time. In this paper, we propose a deep convolutional neural network for active object recognition that simultaneously predicts the object label, and selects the next action to perform on the object with the aim of improving recognition performance. We treat active object recognition as a reinforcement learning problem and derive the cost function to train the network for joint prediction of the object label and the action. A generative model of object similarities based on the Dirichlet distribution is proposed and embedded in the network for encoding the state of the system. The training is carried out by simultaneously minimizing the label and action prediction errors using gradient descent. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.