Loading paper
PRISM: A Multi-View Multi-Capability Retail Video Dataset for Embodied Vision-Language Models | Tomesphere