Pre-Training a Graph Recurrent Network for Language Representation
Yile Wang, Linyi Yang, Zhiyang Teng, Ming Zhou, Yue Zhang

TL;DR
This paper introduces a graph recurrent network for language representation that offers a more efficient, linear-complexity alternative to Transformers, demonstrating effectiveness in various language understanding tasks across English and Chinese.
Contribution
The paper proposes a novel graph recurrent network architecture optimized for self-supervised learning, improving efficiency and output diversity over attention-based models.
Findings
Achieves linear complexity, reducing inference time.
Generates more diverse outputs with less redundancy.
Effective in both English and Chinese language tasks.
Abstract
Transformer-based pre-trained models have gained much advance in recent years, becoming one of the most important backbones in natural language processing. Recent work shows that the attention mechanism inside Transformer may not be necessary, both convolutional neural networks and multi-layer perceptron based models have also been investigated as Transformer alternatives. In this paper, we consider a graph recurrent network for language model pre-training, which builds a graph structure for each sequence with local token-level communications, together with a sentence-level representation decoupled from other tokens. The original model performs well in domain-specific text classification under supervised training, however, its potential in learning transfer knowledge by self-supervised way has not been fully exploited. We fill this gap by optimizing the architecture and verifying its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Absolute Position Encodings · Adam · Softmax · Multi-Head Attention · Residual Connection · Position-Wise Feed-Forward Layer · Dropout
