Pre-Training a Graph Recurrent Network for Language Representation

Yile Wang; Linyi Yang; Zhiyang Teng; Ming Zhou; Yue Zhang

arXiv:2209.03834·cs.CL·October 27, 2022·1 cites

Pre-Training a Graph Recurrent Network for Language Representation

Yile Wang, Linyi Yang, Zhiyang Teng, Ming Zhou, Yue Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a graph recurrent network for language representation that offers a more efficient, linear-complexity alternative to Transformers, demonstrating effectiveness in various language understanding tasks across English and Chinese.

Contribution

The paper proposes a novel graph recurrent network architecture optimized for self-supervised learning, improving efficiency and output diversity over attention-based models.

Findings

01

Achieves linear complexity, reducing inference time.

02

Generates more diverse outputs with less redundancy.

03

Effective in both English and Chinese language tasks.

Abstract

Transformer-based pre-trained models have gained much advance in recent years, becoming one of the most important backbones in natural language processing. Recent work shows that the attention mechanism inside Transformer may not be necessary, both convolutional neural networks and multi-layer perceptron based models have also been investigated as Transformer alternatives. In this paper, we consider a graph recurrent network for language model pre-training, which builds a graph structure for each sequence with local token-level communications, together with a sentence-level representation decoupled from other tokens. The original model performs well in domain-specific text classification under supervised training, however, its potential in learning transfer knowledge by self-supervised way has not been fully exploited. We fill this gap by optimizing the architecture and verifying its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ylwangy/slstm_pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Absolute Position Encodings · Adam · Softmax · Multi-Head Attention · Residual Connection · Position-Wise Feed-Forward Layer · Dropout