Robot In a Room: Toward Perfect Object Recognition in Closed   Environments

Shuran Song; Linguang Zhang; Jianxiong Xiao

arXiv:1507.02703·cs.CV·July 13, 2015·23 cites

Robot In a Room: Toward Perfect Object Recognition in Closed Environments

Shuran Song, Linguang Zhang, Jianxiong Xiao

PDF

Open Access

TL;DR

This paper presents a method for robots to achieve near-human object recognition accuracy in closed environments by leveraging environment constraints, 3D mapping, and crowd-sourced annotation, demonstrating promising results for practical robotic vision.

Contribution

It introduces a robust system combining 3D mapping and crowd annotation to enable reliable object recognition for robots in limited environments, a novel approach in robotic vision.

Findings

01

High recognition accuracy in closed environments

02

Effective background subtraction using 3D maps

03

Feasibility of crowd-sourced annotation for object labeling

Abstract

While general object recognition is still far from being solved, this paper proposes a way for a robot to recognize every object at an almost human-level accuracy. Our key observation is that many robots will stay in a relatively closed environment (e.g. a house or an office). By constraining a robot to stay in a limited territory, we can ensure that the robot has seen most objects before and the speed of introducing a new object is slow. Furthermore, we can build a 3D map of the environment to reliably subtract the background to make recognition easier. We propose extremely robust algorithms to obtain a 3D map and enable humans to collectively annotate objects. During testing time, our algorithm can recognize all objects very reliably, and query humans from crowd sourcing platform if confidence is low or new objects are identified. This paper explains design decisions in building such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings