HapticLLaMA: A Multimodal Sensory Language Model for Haptic Captioning

Guimin Hu; Daniel Hershcovich; Hasti Seifi

arXiv:2508.06475·cs.CL·January 15, 2026

HapticLLaMA: A Multimodal Sensory Language Model for Haptic Captioning

Guimin Hu, Daniel Hershcovich, Hasti Seifi

PDF

Open Access 1 Models 1 Video

TL;DR

HapticLLaMA is a novel multimodal language model that interprets vibration signals into descriptive language, advancing haptic captioning for virtual reality, accessibility, and rehabilitation, with improved human-aligned performance.

Contribution

The paper introduces HapticLLaMA, the first large language model designed for haptic captioning, integrating novel haptic tokenizers and reinforcement learning for better sensory understanding.

Findings

01

Achieved METEOR score of 59.98 and BLEU-4 score of 32.06.

02

Over 61% of captions rated above 3.5 by humans.

03

RLHF improved human rating distribution by 10%.

Abstract

Haptic captioning is the task of generating natural language descriptions from haptic signals, such as vibrations, for use in virtual reality, accessibility, and rehabilitation applications. While previous multimodal research has focused primarily on vision and audio, haptic signals for the sense of touch remain underexplored. To address this gap, we formalize the haptic captioning task and propose HapticLLaMA, a multimodal sensory language model that interprets vibration signals into descriptions in a given sensory, emotional, or associative category. We investigate two types of haptic tokenizers, a frequency-based tokenizer and an EnCodec-based tokenizer, that convert haptic signals into sequences of discrete units, enabling their integration with the LLaMA model. HapticLLaMA is trained in two stages: (1) supervised fine-tuning using the LLaMA architecture with LoRA-based adaptation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
GuiminHu/HapticLLaMA
model

Videos

HapticLLaMA: A Multimodal Sensory Language Model for Haptic Captioning· underline

Taxonomy

TopicsSpeech and dialogue systems · Subtitles and Audiovisual Media · Hand Gesture Recognition Systems