Hierarchical GPT with Congruent Transformers for Multi-Sentence Language   Models

Jihyeon Roh; Huiseong Gim; Soo-Young Lee

arXiv:2009.08636·cs.CL·September 21, 2020

Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models

Jihyeon Roh, Huiseong Gim, Soo-Young Lee

PDF

Open Access

TL;DR

This paper introduces a hierarchical GPT model with congruent transformers designed for multi-sentence dialogue generation and document understanding, improving performance by aligning similarity measures and structuring sentence processing.

Contribution

The paper proposes a novel hierarchical GPT architecture with congruent transformers and a new similarity measure alignment, enhancing multi-sentence language modeling capabilities.

Findings

01

Improved performance on multi-sentence tasks

02

Effective hierarchical structure for dialogue and document understanding

03

Enhanced similarity comparison in transformers

Abstract

We report a GPT-based multi-sentence language model for dialogue generation and document understanding. First, we propose a hierarchical GPT which consists of three blocks, i.e., a sentence encoding block, a sentence generating block, and a sentence decoding block. The sentence encoding and decoding blocks are basically the encoder-decoder blocks of the standard Transformers, which work on each sentence independently. The sentence generating block is inserted between the encoding and decoding blocks, and generates the next sentence embedding vector from the previous sentence embedding vectors. We believe it is the way human make conversation and understand paragraphs and documents. Since each sentence may consist of fewer words, the sentence encoding and decoding Transformers can use much smaller dimensional embedding vectors. Secondly, we note the attention in the Transformers utilizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Attention Dropout · Byte Pair Encoding · Discriminative Fine-Tuning · Multi-Head Attention · Attention Is All You Need · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection