A neuro-symbolic approach for multimodal reference expression   comprehension

Aman Jain; Anirudh Reddy Kondapally; Kentaro Yamada; Hitomi Yanaka

arXiv:2306.10717·cs.HC·June 21, 2023·1 cites

A neuro-symbolic approach for multimodal reference expression comprehension

Aman Jain, Anirudh Reddy Kondapally, Kentaro Yamada, Hitomi Yanaka

PDF

Open Access

TL;DR

This paper presents an interpretable neuro-symbolic model for multimodal reference expression comprehension in HMI systems, integrating gestures and visual cues within a VR environment, emphasizing transparency and robustness.

Contribution

It introduces a novel neuro-symbolic approach that enhances interpretability and generalizability for multimodal reference understanding in human-machine interaction.

Findings

01

Model achieves high accuracy in object identification

02

Demonstrates robustness in unseen environments

03

Outperforms purely neural approaches in transparency

Abstract

Human-Machine Interaction (HMI) systems have gained huge interest in recent years, with reference expression comprehension being one of the main challenges. Traditionally human-machine interaction has been mostly limited to speech and visual modalities. However, to allow for more freedom in interaction, recent works have proposed the integration of additional modalities, such as gestures in HMI systems. We consider such an HMI system with pointing gestures and construct a table-top object picking scenario inside a simulated virtual reality (VR) environment to collect data. Previous works for such a task have used deep neural networks to classify the referred object, which lacks transparency. In this work, we propose an interpretable and compositional model, crucial to building robust HMI systems for real-world application, based on a neuro-symbolic approach to tackle this task. Finally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Hand Gesture Recognition Systems · Natural Language Processing Techniques