TL;DR
This paper introduces RackLay, a real-time deep neural network for multi-layer layout estimation of warehouse racks from a single image, enabling 3D reasoning and space estimation in cluttered scenes.
Contribution
We propose RackLay, a novel neural network architecture for multi-layer warehouse rack layout estimation from monocular images, and release WareSynth, a synthetic dataset generation pipeline.
Findings
RackLay accurately estimates multi-layer layouts across diverse scenes.
Fusing top-view and front-view enables effective 3D reasoning.
Our ablations and comparisons validate RackLay's effectiveness.
Abstract
Given a monocular colour image of a warehouse rack, we aim to predict the bird's-eye view layout for each shelf in the rack, which we term as multi-layer layout prediction. To this end, we present RackLay, a deep neural network for real-time shelf layout estimation from a single image. Unlike previous layout estimation methods, which provide a single layout for the dominant ground plane alone, RackLay estimates the top-view and front-view layout for each shelf in the considered rack populated with objects. RackLay's architecture and its variants are versatile and estimate accurate layouts for diverse scenes characterized by varying number of visible shelves in an image, large range in shelf occupancy factor and varied background clutter. Given the extreme paucity of datasets in this space and the difficulty involved in acquiring real data from warehouses, we additionally release a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
