Searching for Objects using Structure in Indoor Scenes
Varun K. Nagaraja, Vlad I. Morariu, Larry S. Davis

TL;DR
This paper introduces a scene-structure-based search method for indoor object detection that efficiently identifies objects by sequentially exploring image regions using a learned strategy, reducing processing effort significantly.
Contribution
It presents a novel search technique modeled as a Markov decision process and trained via imitation learning, leveraging scene and object context for efficient indoor object detection.
Findings
High average precision with only 20-25% region processing for certain classes
Scene context alone is highly effective for object detection
Object-object context further improves detection performance
Abstract
To identify the location of objects of a particular class, a passive computer vision system generally processes all the regions in an image to finally output few regions. However, we can use structure in the scene to search for objects without processing the entire image. We propose a search technique that sequentially processes image regions such that the regions that are more likely to correspond to the query class object are explored earlier. We frame the problem as a Markov decision process and use an imitation learning algorithm to learn a search strategy. Since structure in the scene is essential for search, we work with indoor scene images as they contain both unary scene context information and object-object context in the scene. We perform experiments on the NYU-depth v2 dataset and show that the unary scene context features alone can achieve a significantly high average…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
