Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
Jihyeon Roh, Huiseong Gim, Soo-Young Lee

TL;DR
This paper introduces a hierarchical GPT model with congruent transformers designed for multi-sentence dialogue generation and document understanding, improving performance by aligning similarity measures and structuring sentence processing.
Contribution
The paper proposes a novel hierarchical GPT architecture with congruent transformers and a new similarity measure alignment, enhancing multi-sentence language modeling capabilities.
Findings
Improved performance on multi-sentence tasks
Effective hierarchical structure for dialogue and document understanding
Enhanced similarity comparison in transformers
Abstract
We report a GPT-based multi-sentence language model for dialogue generation and document understanding. First, we propose a hierarchical GPT which consists of three blocks, i.e., a sentence encoding block, a sentence generating block, and a sentence decoding block. The sentence encoding and decoding blocks are basically the encoder-decoder blocks of the standard Transformers, which work on each sentence independently. The sentence generating block is inserted between the encoding and decoding blocks, and generates the next sentence embedding vector from the previous sentence embedding vectors. We believe it is the way human make conversation and understand paragraphs and documents. Since each sentence may consist of fewer words, the sentence encoding and decoding Transformers can use much smaller dimensional embedding vectors. Secondly, we note the attention in the Transformers utilizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsLinear Layer · Cosine Annealing · Linear Warmup With Cosine Annealing · Attention Dropout · Byte Pair Encoding · Discriminative Fine-Tuning · Multi-Head Attention · Attention Is All You Need · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection
