Towards Object Detection from Motion
Rico Jonschkowski, Austin Stone

TL;DR
This paper introduces a weakly supervised object detection method that learns to detect objects from motion cues using only two videos, eliminating the need for annotated images.
Contribution
It proposes a novel approach that trains object detectors based on physical plausibility of motion from videos without explicit annotations.
Findings
Successfully detects objects in robotics scenarios
Learns from motion cues without object location labels
Performs well in various robotics settings
Abstract
We present a novel approach to weakly supervised object detection. Instead of annotated images, our method only requires two short videos to learn to detect a new object: 1) a video of a moving object and 2) one or more "negative" videos of the scene without the object. The key idea of our algorithm is to train the object detector to produce physically plausible object motion when applied to the first video and to not detect anything in the second video. With this approach, our method learns to locate objects without any object location annotations. Once the model is trained, it performs object detection on single images. We evaluate our method in three robotics settings that afford learning objects from motion: observing moving objects, watching demonstrations of object manipulation, and physically interacting with objects (see a video summary at https://youtu.be/BH0Hv3zZG_4).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
