Augmenting Transformers with Recursively Composed Multi-grained   Representations

Xiang Hu; Qingyang Zhu; Kewei Tu; Wei Wu

arXiv:2309.16319·cs.CL·March 13, 2024

Augmenting Transformers with Recursively Composed Multi-grained Representations

Xiang Hu, Qingyang Zhu, Kewei Tu, Wei Wu

PDF

Open Access 1 Repo 1 Video

TL;DR

ReCAT is a recursive Transformer model that explicitly models hierarchical syntactic structures using a novel CIO layer, enabling deep span interactions, improved performance on span tasks, and interpretable syntactic representations.

Contribution

The paper introduces ReCAT, a recursive Transformer with CIO layers for explicit hierarchical structure modeling, enhancing performance and interpretability without relying on gold trees.

Findings

01

ReCAT significantly outperforms vanilla Transformers on span-level tasks.

02

ReCAT's hierarchical structures align well with human-annotated syntactic trees.

03

CIO layers enable deep intra-span and inter-span interactions.

Abstract

We present ReCAT, a recursive composition augmented Transformer that is able to explicitly model hierarchical syntactic structures of raw texts without relying on gold trees during both learning and inference. Existing research along this line restricts data to follow a hierarchical tree structure and thus lacks inter-span communications. To overcome the problem, we propose a novel contextual inside-outside (CIO) layer that learns contextualized representations of spans through bottom-up and top-down passes, where a bottom-up pass forms representations of high-level spans by composing low-level spans, while a top-down pass combines information inside and outside a span. By stacking several CIO layers between the embedding layer and the attention layers in Transformer, the ReCAT model can perform both deep intra-span and deep inter-span interactions, and thus generate multi-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ant-research/structuredlm_rtdt
pytorchOfficial

Videos

Augmenting Transformers with Recursively Composed Multi-grained Representations· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Absolute Position Encodings · Dense Connections · Layer Normalization · Byte Pair Encoding