Straight to Shapes: Real-time Detection of Encoded Shapes
Saumya Jetley, Michael Sapienza, Stuart Golodetz, Philip H.S. Torr

TL;DR
This paper introduces a real-time object detection method that predicts object shapes directly, using a shape embedding space to improve instance-specific understanding and generalization to unseen categories.
Contribution
It presents the first real-time shape prediction network that integrates shape encoding with object detection, enabling higher-order shape reasoning in a fast, end-to-end manner.
Findings
Runs at ~35 FPS on high-end desktops
Generalizes to unseen categories effectively
Provides richer object instance information beyond bounding boxes
Abstract
Current object detection approaches predict bounding boxes, but these provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to directly regress to objects' shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order concepts such as view similarity, pose variation and occlusion. To achieve this, we use a denoising convolutional auto-encoder to establish an embedding space, and place the decoder after a fast end-to-end network trained to regress directly to the encoded shape vectors. This yields what to the best of our knowledge is the first real-time shape prediction network, running at ~35 FPS on a high-end desktop. With higher-order shape reasoning well-integrated into the network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
