MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos

Yihong Sun; Bharath Hariharan

arXiv:2405.14841·cs.CV·August 1, 2024

MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos

Yihong Sun, Bharath Hariharan

PDF

Open Access 1 Repo

TL;DR

MOD-UV is a novel unsupervised mobile object detection method that learns from unlabeled videos by leveraging motion cues, achieving state-of-the-art results without external data or supervision.

Contribution

It introduces a new training paradigm that progressively discovers small and static-but-mobile objects, enhancing unsupervised detection from unlabeled videos.

Findings

01

Achieves state-of-the-art unsupervised detection on Waymo, nuScenes, and KITTI datasets.

02

Effectively detects and segments mobile objects from a single static image.

03

Does not require external data or supervised models.

Abstract

Embodied agents must detect and localize objects of interest, e.g. traffic participants for self-driving cars. Supervision in the form of bounding boxes for this task is extremely expensive. As such, prior work has looked at unsupervised instance detection and segmentation, but in the absence of annotated boxes, it is unclear how pixels must be grouped into objects and which objects are of interest. This results in over-/under-segmentation and irrelevant objects. Inspired by human visual system and practical applications, we posit that the key missing cue for unsupervised detection is motion: objects of interest are typically mobile objects that frequently move and their motions can specify separate instances. In this paper, we propose MOD-UV, a Mobile Object Detector learned from Unlabeled Videos only. We begin with instance pseudo-labels derived from motion segmentation, but introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yihongsun/mod-uv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques