# Exploring Temporal Dependencies in Multimodal Referring Expressions with   Mixed Reality

**Authors:** Elena Sibirtseva, Ali Ghadirzadeh, Iolanda Leite, M{\aa}rten, Bj\"orkman, Danica Kragic

arXiv: 1902.01117 · 2019-02-05

## TL;DR

This paper presents a model that disambiguates multimodal referring expressions in mixed reality by analyzing temporal dependencies across speech, gestures, and head movements, enhancing human-robot communication.

## Contribution

It introduces a Bayesian model that incorporates temporal dependencies in multimodal data to improve disambiguation of referring expressions in mixed reality environments.

## Key findings

- Model with temporal prior outperforms without it
- Temporal dependencies improve disambiguation accuracy
- Data analysis supports hypothesis on event modeling

## Abstract

In collaborative tasks, people rely both on verbal and non-verbal cues simultaneously to communicate with each other. For human-robot interaction to run smoothly and naturally, a robot should be equipped with the ability to robustly disambiguate referring expressions. In this work, we propose a model that can disambiguate multimodal fetching requests using modalities such as head movements, hand gestures, and speech. We analysed the acquired data from mixed reality experiments and formulated a hypothesis that modelling temporal dependencies of events in these three modalities increases the model's predictive power. We evaluated our model on a Bayesian framework to interpret referring expressions with and without exploiting a temporal prior.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.01117/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1902.01117/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1902.01117/full.md

---
Source: https://tomesphere.com/paper/1902.01117