DiscoDVT: Generating Long Text with Discourse-Aware Discrete Variational Transformer
Haozhe Ji, Minlie Huang

TL;DR
DiscoDVT introduces a discourse-aware discrete variational Transformer that improves long-range coherence in generated texts by modeling discourse structures through discrete latent variables.
Contribution
It proposes a novel discourse-aware discrete variational Transformer that captures global discourse structure to enhance long text generation coherence.
Findings
Latent codes align with discourse structures
Generated texts show improved long-range coherence
Model outperforms baselines on story datasets
Abstract
Despite the recent advances in applying pre-trained language models to generate high-quality texts, generating long passages that maintain long-range coherence is yet challenging for these models. In this paper, we propose DiscoDVT, a discourse-aware discrete variational Transformer to tackle the incoherence issue. DiscoDVT learns a discrete variable sequence that summarizes the global structure of the text and then applies it to guide the generation process at each decoding step. To further embed discourse-aware information into the discrete latent representations, we introduce an auxiliary objective to model the discourse relations within the text. We conduct extensive experiments on two open story generation datasets and demonstrate that the latent codes learn meaningful correspondence to the discourse structures that guide the model to generate long texts with better long-range…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Softmax · Dropout · Dense Connections · Layer Normalization · Absolute Position Encodings
