Generative Pretrained Structured Transformers: Unsupervised Syntactic   Language Models at Scale

Xiang Hu; Pengyu Ji; Qingyang Zhu; Wei Wu; Kewei Tu

arXiv:2403.08293·cs.CL·June 18, 2024·1 cites

Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale

Xiang Hu, Pengyu Ji, Qingyang Zhu, Wei Wu, Kewei Tu

PDF

Open Access 2 Repos

TL;DR

GPST is an unsupervised, scalable syntactic language model that jointly learns to generate sentences and parse trees, outperforming previous models in language understanding, generation, and grammar induction.

Contribution

Introduces GPST, a novel unsupervised structured transformer model that enables parallel training and surpasses prior models in multiple NLP tasks.

Findings

01

Outperforms GPT-2 in various language tasks

02

Significantly better grammar induction results

03

Faster training compared to existing SLMs

Abstract

A syntactic language model (SLM) incrementally generates a sentence with its syntactic tree in a left-to-right manner. We present Generative Pretrained Structured Transformers (GPST), an unsupervised SLM at scale capable of being pre-trained from scratch on raw texts with high parallelism. GPST circumvents the limitations of previous SLMs such as relying on gold trees and sequential training. It consists of two components, a usual SLM supervised by a uni-directional language modeling loss, and an additional composition model, which induces syntactic parse trees and computes constituent representations, supervised by a bi-directional language modeling loss. We propose a representation surrogate to enable joint parallel training of the two models in a hard-EM fashion. We pre-train GPST on OpenWebText, a corpus with $9$ billion tokens, and demonstrate the superiority of GPST over GPT-2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Residual Connection · Weight Decay · Linear Layer · Dense Connections · Adam · Dropout · Multi-Head Attention