Sequential Voting with Relational Box Fields for Active Object Detection
Qichen Fu, Xingyu Liu, Kris M. Kitani

TL;DR
This paper introduces a pixel-wise voting method called Relational Box Field for active object detection, improving localization accuracy through sequential voting and reinforcement learning, achieving significant performance gains on large datasets.
Contribution
The paper proposes a novel pixel-wise voting function and Relational Box Field for active object detection, enhanced by reinforcement learning to iteratively refine bounding boxes.
Findings
Improved AP50 by 8% on 100DOH dataset.
Achieved 30% AP50 improvement on MECCANO dataset.
Demonstrated effectiveness of RL in training the voting function.
Abstract
A key component of understanding hand-object interactions is the ability to identify the active object -- the object that is being manipulated by the human hand. In order to accurately localize the active object, any method must reason using information encoded by each image pixel, such as whether it belongs to the hand, the object, or the background. To leverage each pixel as evidence to determine the bounding box of the active object, we propose a pixel-wise voting function. Our pixel-wise voting function takes an initial bounding box as input and produces an improved bounding box of the active object as output. The voting function is designed so that each pixel inside of the input bounding box votes for an improved bounding box, and the box with the majority vote is selected as the output. We call the collection of bounding boxes generated inside of the voting function, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning
