# Visually Grounded Neural Syntax Acquisition

**Authors:** Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu

arXiv: 1906.02890 · 2019-09-26

## TL;DR

This paper introduces VG-NSL, a model that learns syntactic structures by integrating visual grounding from images and captions, outperforming previous unsupervised methods in parsing accuracy across multiple languages.

## Contribution

The paper presents a novel visually grounded neural model for unsupervised syntax learning that leverages image-caption pairs to improve parsing stability and linguistic relevance.

## Key findings

- VG-NSL outperforms other unsupervised parsers on MSCOCO.
- The model's concreteness measure correlates with linguistic intuition.
- VG-NSL generalizes well across multiple languages.

## Abstract

We present the Visually Grounded Neural Syntax Learner (VG-NSL), an approach for learning syntactic representations and structures without any explicit supervision. The model learns by looking at natural images and reading paired captions. VG-NSL generates constituency parse trees of texts, recursively composes representations for constituents, and matches them with images. We define concreteness of constituents by their matching scores with images, and use it to guide the parsing of text. Experiments on the MSCOCO data set show that VG-NSL outperforms various unsupervised parsing approaches that do not use visual grounding, in terms of F1 scores against gold parse trees. We find that VGNSL is much more stable with respect to the choice of random initialization and the amount of training data. We also find that the concreteness acquired by VG-NSL correlates well with a similar measure defined by linguists. Finally, we also apply VG-NSL to multiple languages in the Multi30K data set, showing that our model consistently outperforms prior unsupervised approaches.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.02890/full.md

## Figures

27 figures with captions in the complete paper: https://tomesphere.com/paper/1906.02890/full.md

## References

79 references — full list in the complete paper: https://tomesphere.com/paper/1906.02890/full.md

---
Source: https://tomesphere.com/paper/1906.02890