Robot In a Room: Toward Perfect Object Recognition in Closed Environments
Shuran Song, Linguang Zhang, Jianxiong Xiao

TL;DR
This paper presents a method for robots to achieve near-human object recognition accuracy in closed environments by leveraging environment constraints, 3D mapping, and crowd-sourced annotation, demonstrating promising results for practical robotic vision.
Contribution
It introduces a robust system combining 3D mapping and crowd annotation to enable reliable object recognition for robots in limited environments, a novel approach in robotic vision.
Findings
High recognition accuracy in closed environments
Effective background subtraction using 3D maps
Feasibility of crowd-sourced annotation for object labeling
Abstract
While general object recognition is still far from being solved, this paper proposes a way for a robot to recognize every object at an almost human-level accuracy. Our key observation is that many robots will stay in a relatively closed environment (e.g. a house or an office). By constraining a robot to stay in a limited territory, we can ensure that the robot has seen most objects before and the speed of introducing a new object is slow. Furthermore, we can build a 3D map of the environment to reliably subtract the background to make recognition easier. We propose extremely robust algorithms to obtain a 3D map and enable humans to collectively annotate objects. During testing time, our algorithm can recognize all objects very reliably, and query humans from crowd sourcing platform if confidence is low or new objects are identified. This paper explains design decisions in building such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
