Desk Organization: Effect of Multimodal Inputs on Spatial Relational   Learning

Ryan Rowe; Shivam Singhal; Daqing Yi; Tapomayukh Bhattacharjee and; Siddhartha S. Srinivasa

arXiv:2108.01254·cs.RO·August 4, 2021

Desk Organization: Effect of Multimodal Inputs on Spatial Relational Learning

Ryan Rowe, Shivam Singhal, Daqing Yi, Tapomayukh Bhattacharjee and, Siddhartha S. Srinivasa

PDF

TL;DR

This paper investigates how multimodal sensory inputs, including vision, haptics, and perceived utility, influence the learning of spatial object arrangements on desks, using models trained on synthetic and human data.

Contribution

It introduces a multimodal approach combining vision, haptics, and utility to model human desk organization preferences, with models achieving high accuracy and interpretability.

Findings

01

Random forests achieved over 90% accuracy on human data.

02

UV and HUV modalities were most informative for organization.

03

Participants preferred models based on random forests over random models.

Abstract

For robots to operate in a three dimensional world and interact with humans, learning spatial relationships among objects in the surrounding is necessary. Reasoning about the state of the world requires inputs from many different sensory modalities including vision ( $V$ ) and haptics ( $H$ ). We examine the problem of desk organization: learning how humans spatially position different objects on a planar surface according to organizational ''preference''. We model this problem by examining how humans position objects given multiple features received from vision and haptic modalities. However, organizational habits vary greatly between people both in structure and adherence. To deal with user organizational preferences, we add an additional modality, ''utility'' ( $U$ ), which informs on a particular human's perceived usefulness of a given object. Models were trained as generalized (over many…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.