Revisiting the Architectures like Pointer Networks to Efficiently   Improve the Next Word Distribution, Summarization Factuality, and Beyond

Haw-Shiuan Chang; Zonghai Yao; Alolika Gon; Hong Yu; Andrew McCallum

arXiv:2305.12289·cs.CL·May 23, 2023·1 cites

Revisiting the Architectures like Pointer Networks to Efficiently Improve the Next Word Distribution, Summarization Factuality, and Beyond

Haw-Shiuan Chang, Zonghai Yao, Alolika Gon, Hong Yu, Andrew McCallum

PDF

Open Access 1 Repo

TL;DR

This paper challenges the dominance of softmax in language models, proposing efficient pointer network alternatives that improve next word prediction and summarization factuality without significant speed loss.

Contribution

It introduces simplified pointer network-based methods as effective softmax alternatives, enhancing language model performance and factual accuracy in summarization tasks.

Findings

01

Outperforms mixture of softmax in GPT-2

02

Improves factCC scores by 2 points in CNN/DM and XSUM

03

Increases MAUVE scores by 30% in BookSum

Abstract

Is the output softmax layer, which is adopted by most language models (LMs), always the best way to compute the next word probability? Given so many attention layers in a modern transformer-based LM, are the pointer networks redundant nowadays? In this study, we discover that the answers to both questions are no. This is because the softmax bottleneck sometimes prevents the LMs from predicting the desired distribution and the pointer networks can be used to break the bottleneck efficiently. Based on the finding, we propose several softmax alternatives by simplifying the pointer networks and accelerating the word-by-word rerankers. In GPT-2, our proposals are significantly better and more efficient than mixture of softmax, a state-of-the-art softmax alternative. In summarization experiments, without significantly decreasing its training/testing speed, our best method based on T5-Small…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iesl/softmax-cpr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Attention Dropout · Adam