Preference-Conditioned Reinforcement Learning for Space-Time Efficient Online 3D Bin Packing
Nikita Sarawgi, Omey M. Manyar, Fan Wang, Thinh H. Nguyen, Daniel Seita, Satyandra K. Gupta

TL;DR
This paper introduces STEP, a preference-conditioned reinforcement learning approach for online 3D bin packing that balances space utilization and operational time, significantly reducing packing time while maintaining density.
Contribution
We propose a novel Transformer-based RL method that explicitly reasons over space-time trade-offs, enabling more efficient and adaptable robotic bin packing.
Findings
44% reduction in operational time
Maintains packing density comparable to existing methods
Generalizes across candidate set sizes
Abstract
Robotic bin packing is widely deployed in warehouse automation, with current systems achieving robust performance through heuristic and learning-based strategies. These systems must balance compact placement with rapid execution, where selecting alternative items or reorienting them can improve space utilization but introduce additional time. We propose a selection-based formulation that explicitly reasons over this trade-off: at each step, the robot evaluates multiple candidate actions, weighing expected packing benefit against estimated operational time. This enables time-aware strategies that selectively accept increased operational time when it yields meaningful spatial improvements. Our method, STEP (Space-Time Efficient Packing), uses a preference-conditioned, Transformer-based reinforcement learning policy, and allows generalization across candidate set sizes and integration with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Packing Problems · Advanced Manufacturing and Logistics Optimization · Scheduling and Optimization Algorithms
