Octopi: Object Property Reasoning with Large Tactile-Language Models
Samson Yu, Kelvin Lin, Anxing Xiao, Jiafei Duan, Harold Soh

TL;DR
This paper introduces Octopi, a system combining tactile perception and language models for physical reasoning in robots, supported by a new dataset PhysiCLeAR with tactile videos and reasoning tasks.
Contribution
It presents a novel approach integrating tactile data with large vision-language models and introduces a new dataset for physical property reasoning tasks.
Findings
Octopi improves physical property reasoning accuracy.
Tactile data enhances reasoning over visual inputs.
The dataset enables diverse tactile-related task evaluation.
Abstract
Physical reasoning is important for effective robot manipulation. Recent work has investigated both vision and language modalities for physical reasoning; vision can reveal information about objects in the environment and language serves as an abstraction and communication medium for additional context. Although these works have demonstrated success on a variety of physical reasoning tasks, they are limited to physical properties that can be inferred from visual or language inputs. In this work, we investigate combining tactile perception with language, which enables embodied systems to obtain physical properties through interaction and apply commonsense reasoning. We contribute a new dataset PhysiCLeAR, which comprises both physical/property reasoning tasks and annotated tactile videos obtained using a GelSight tactile sensor. We then introduce Octopi, a system that leverages both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · BIM and Construction Integration
