Syntax-Guided Transformers: Elevating Compositional Generalization and   Grounding in Multimodal Environments

Danial Kamali; Parisa Kordjamshidi

arXiv:2311.04364·cs.CL·November 9, 2023·1 cites

Syntax-Guided Transformers: Elevating Compositional Generalization and Grounding in Multimodal Environments

Danial Kamali, Parisa Kordjamshidi

PDF

Open Access

TL;DR

This paper enhances multimodal AI models' ability to generalize compositionally by leveraging syntactic structures and attention masking, leading to improved grounding and state-of-the-art performance.

Contribution

It introduces syntactic information integration into transformers for multimodal grounding, demonstrating improved compositional generalization and parameter efficiency.

Findings

01

Dependency parsing improves grounding performance.

02

Syntactic attention masking enhances compositional generalization.

03

Weight sharing across Transformer encoders boosts results.

Abstract

Compositional generalization, the ability of intelligent models to extrapolate understanding of components to novel compositions, is a fundamental yet challenging facet in AI research, especially within multimodal environments. In this work, we address this challenge by exploiting the syntactic structure of language to boost compositional generalization. This paper elevates the importance of syntactic grounding, particularly through attention masking techniques derived from text input parsing. We introduce and evaluate the merits of using syntactic information in the multimodal grounding problem. Our results on grounded compositional generalization underscore the positive impact of dependency parsing across diverse tasks when utilized with Weight Sharing across the Transformer encoder. The results push the state-of-the-art in multimodal grounding and parameter-efficient modeling and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsAttention Is All You Need · Label Smoothing · Linear Layer · Absolute Position Encodings · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Dropout · Softmax · Dense Connections