TL;DR
SABER is a comprehensive retail robotics dataset capturing detailed human activity and scene dynamics, enabling improved manipulation models without extensive teleoperation.
Contribution
The paper introduces SABER, a large-scale, multi-modal retail robotics dataset collected from real grocery environments, facilitating better model training for retail robot tasks.
Findings
Applying SABER to GR00T N1.6 improves success rate by over 2x compared to baselines.
SABER captures diverse, high-fidelity action data without staged or scripted procedures.
The dataset enables training models that perform better in complex retail manipulation tasks.
Abstract
Robotic deployment in real-world environments depends on rich, domain-specific action data as much as on strong model architecture. General-purpose robot foundation models show modest performance in complex unseen tasks such as manipulation in a retail domain when applied out of the box. The root cause is a data gap: retail environments are structurally absent from general robot pretraining distributions, and the path to filling that gap through teleoperation is prohibitively expensive, logistically constrained, and difficult to scale. We introduce SABER, a high-fidelity retail robotics action dataset built from over 100 hours of natural in-store capture across multiple real grocery environments. Egocentric footage from head-mounted cameras records fine-grained hand activity at the point of interaction, while exocentric 360-degree scene footage from DreamVu's ALIA camera simultaneously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
