Visuo-Tactile Zero-Shot Object Recognition with Vision-Language Model

Shiori Ueda; Atsushi Hashimoto; Masashi Hamaya; Kazutoshi Tanaka,; Hideo Saito

arXiv:2409.09276·cs.RO·September 17, 2024

Visuo-Tactile Zero-Shot Object Recognition with Vision-Language Model

Shiori Ueda, Atsushi Hashimoto, Masashi Hamaya, Kazutoshi Tanaka,, Hideo Saito

PDF

Open Access

TL;DR

This paper introduces a novel method that integrates tactile data into vision-language models to improve zero-shot object recognition, especially for visually similar objects, with low training costs and effective results on specific datasets.

Contribution

It presents a new approach to incorporate tactile information into vision-language models for zero-shot recognition, translating tactile data into textual descriptions based on object names.

Findings

01

Effective recognition of visually similar objects

02

Low training cost due to textual annotation approach

03

Demonstrated success on FoodReplica and Cube datasets

Abstract

Tactile perception is vital, especially when distinguishing visually similar objects. We propose an approach to incorporate tactile data into a Vision-Language Model (VLM) for visuo-tactile zero-shot object recognition. Our approach leverages the zero-shot capability of VLMs to infer tactile properties from the names of tactilely similar objects. The proposed method translates tactile data into a textual description solely by annotating object names for each tactile sequence during training, making it adaptable to various contexts with low training costs. The proposed method was evaluated on the FoodReplica and Cube datasets, demonstrating its effectiveness in recognizing objects that are difficult to distinguish by vision alone.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Tactile and Sensory Interactions · Visual Attention and Saliency Detection