OPDMulti: Openable Part Detection for Multiple Objects
Xiaohao Sun, Hanxiao Jiang, Manolis Savva, Angel Xuan Chang

TL;DR
This paper introduces OPDFormer, a transformer-based approach for detecting openable parts in images with multiple objects, advancing beyond single-object assumptions and providing a new dataset for real-world scenes.
Contribution
The work extends openable part detection to multi-object scenes, introduces a new dataset, and proposes a novel transformer architecture that outperforms previous methods.
Findings
OPDFormer significantly outperforms prior methods.
Multi-object scenarios remain challenging for current approaches.
A new dataset based on real-world scenes was created.
Abstract
Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a single openable object. We generalize this task to scenes with multiple objects each potentially possessing openable parts, and create a corresponding dataset based on real-world scenes. We then address this more challenging scenario with OPDFormer: a part-aware transformer architecture. Our experiments show that the OPDFormer architecture significantly outperforms prior work. The more realistic multiple-object scenarios we investigated remain challenging for all methods, indicating opportunities for future work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Human Pose and Action Recognition
