# Grounding Symbols in Multi-Modal Instructions

**Authors:** Yordan Hristov, Svetlin Penkov, Alex Lascarides, Subramanian, Ramamoorthy

arXiv: 1706.00355 · 2017-06-02

## TL;DR

This paper introduces a method for grounding symbols in multi-modal instructions, enabling robots to understand and associate language with physical objects using limited user-specific data in real-world scenarios.

## Contribution

The paper presents a novel framework that processes cross-modal inputs to learn and generalize object concepts from small datasets in a robotic setting.

## Key findings

- Model learns color and shape concepts from few demonstrations
- Successfully generalizes to new word combinations
- Effective in a table-top object manipulation task

## Abstract

As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability---for instance, learning to ground symbols in the physical world. Realistically, this task must cope with small datasets consisting of a particular users' contextual assignment of meaning to terms. We present a method for processing a raw stream of cross-modal input---i.e., linguistic instructions, visual perception of a scene and a concurrent trace of 3D eye tracking fixations---to produce the segmentation of objects with a correspondent association to high-level concepts. To test our framework we present experiments in a table-top object manipulation scenario. Our results show our model learns the user's notion of colour and shape from a small number of physical demonstrations, generalising to identifying physical referents for novel combinations of the words.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.00355/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1706.00355/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1706.00355/full.md

---
Source: https://tomesphere.com/paper/1706.00355