Reward Finetuning for Faster and More Accurate Unsupervised Object   Discovery

Katie Z Luo; Zhenzhen Liu; Xiangyu Chen; Yurong You; Sagie Benaim,; Cheng Perng Phoo; Mark Campbell; Wen Sun; Bharath Hariharan; Kilian Q.; Weinberger

arXiv:2310.19080·cs.CV·November 7, 2023·1 cites

Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery

Katie Z Luo, Zhenzhen Liu, Xiangyu Chen, Yurong You, Sagie Benaim,, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, Kilian Q., Weinberger

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a reinforcement learning-based method for unsupervised object discovery from LiDAR data, using heuristics as feedback to improve accuracy and training speed without labeled data.

Contribution

It adapts RLHF techniques to unsupervised object detection, combining heuristics into a reward function to enhance detection accuracy and training efficiency.

Findings

01

More accurate object detection from LiDAR data.

02

Orders of magnitude faster training compared to prior methods.

03

Effective use of heuristics as surrogate feedback.

Abstract

Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, i.e., boxes containing objects are scored higher than those without. We start from the detector's own predictions to explore the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

katieluo88/drift
pytorch

Videos

Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Machine Learning and Algorithms

MethodsALIGN