SentenceVAE: Enable Next-sentence Prediction for Large Language Models   with Faster Speed, Higher Accuracy and Longer Context

Hongjun An; Yifan Chen; Zhe Sun; and Xuelong Li

arXiv:2408.00655·cs.AI·August 15, 2024

SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context

Hongjun An, Yifan Chen, Zhe Sun, and Xuelong Li

PDF

Open Access 1 Repo

TL;DR

SentenceVAE introduces a sentence-level inference method for large language models, significantly improving speed and accuracy while enabling longer context processing by compressing and reconstructing sentences.

Contribution

The paper proposes SentenceVAE, a novel approach that integrates sentence-level encoding into LLMs to enhance inference efficiency and semantic integrity over token-by-token methods.

Findings

01

Inference speed increased by 204-365%

02

Perplexity reduced to 46-75% of original

03

Memory overhead decreased by 86-91%

Abstract

Current large language models (LLMs) primarily utilize next-token prediction method for inference, which significantly impedes their processing speed. In this paper, we introduce a novel inference methodology termed next-sentence prediction, aiming at enhancing the inference efficiency of LLMs. We present Sentence Variational Autoencoder (SentenceVAE), which includes a Sentence Encoder to compress multiple tokens in a sentence into a single token, and a Sentence Decoder to reconstruct it. By integrating SentenceVAE into the input and output layers of LLMs, we develop Sentence-level LLMs (SLLMs) that employ a sentence-by-sentence inference method. In addition, the SentenceVAE module of SLLMs can maintain the integrity of the original semantic content by segmenting the context into sentences, thereby improving accuracy while boosting inference speed. Moreover, compared to previous LLMs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BestAnHongjun/SentenceVAE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings