GestLLM: Advanced Hand Gesture Interpretation via Large Language Models   for Human-Robot Interaction

Oleg Kobzarev; Artem Lykov; Dzmitry Tsetserukou

arXiv:2501.07295·cs.RO·January 16, 2025

GestLLM: Advanced Hand Gesture Interpretation via Large Language Models for Human-Robot Interaction

Oleg Kobzarev, Artem Lykov, Dzmitry Tsetserukou

PDF

Open Access

TL;DR

GestLLM leverages large language models and advanced feature extraction to interpret a wide range of hand gestures for more natural and inclusive human-robot interaction, surpassing limitations of traditional gesture recognition systems.

Contribution

Introduces GestLLM, a novel system combining large language models with MediaPipe for flexible, complex gesture interpretation without additional training or prompt engineering.

Findings

01

Achieves performance comparable to leading vision-language models.

02

Supports recognition of culturally and contextually diverse gestures.

03

Enhances naturalness and inclusivity in robot control.

Abstract

This paper introduces GestLLM, an advanced system for human-robot interaction that enables intuitive robot control through hand gestures. Unlike conventional systems, which rely on a limited set of predefined gestures, GestLLM leverages large language models and feature extraction via MediaPipe to interpret a diverse range of gestures. This integration addresses key limitations in existing systems, such as restricted gesture flexibility and the inability to recognize complex or unconventional gestures commonly used in human communication. By combining state-of-the-art feature extraction and language model capabilities, GestLLM achieves performance comparable to leading vision-language models while supporting gestures underrepresented in traditional datasets. For example, this includes gestures from popular culture, such as the ``Vulcan salute" from Star Trek, without any additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Robotics and Automated Systems