Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
Katie Z Luo, Zhenzhen Liu, Xiangyu Chen, Yurong You, Sagie Benaim,, Cheng Perng Phoo, Mark Campbell, Wen Sun, Bharath Hariharan, Kilian Q., Weinberger

TL;DR
This paper introduces a reinforcement learning-based method for unsupervised object discovery from LiDAR data, using heuristics as feedback to improve accuracy and training speed without labeled data.
Contribution
It adapts RLHF techniques to unsupervised object detection, combining heuristics into a reward function to enhance detection accuracy and training efficiency.
Findings
More accurate object detection from LiDAR data.
Orders of magnitude faster training compared to prior methods.
Effective use of heuristics as surrogate feedback.
Abstract
Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, i.e., boxes containing objects are scored higher than those without. We start from the detector's own predictions to explore the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Machine Learning and Algorithms
MethodsALIGN
