CliqueParcel: An Approach For Batching LLM Prompts That Jointly   Optimizes Efficiency And Faithfulness

Jiayi Liu; Tinghan Yang; Jennifer Neville

arXiv:2402.14833·cs.CL·February 26, 2024·1 cites

CliqueParcel: An Approach For Batching LLM Prompts That Jointly Optimizes Efficiency And Faithfulness

Jiayi Liu, Tinghan Yang, Jennifer Neville

PDF

Open Access

TL;DR

CliqueParcel is a novel batching method for LLM prompts that enhances inference efficiency while maintaining output faithfulness, addressing the common trade-off between speed and quality in large language model applications.

Contribution

This paper introduces CliqueParcel, a new prompt batching approach that optimizes inference efficiency without sacrificing output accuracy or faithfulness, filling a gap in existing methods.

Findings

01

Significantly improves inference efficiency on multiple datasets.

02

Maintains high faithfulness and accuracy in outputs.

03

Provides a comprehensive analysis of efficiency-faithfulness trade-offs.

Abstract

Large language models (LLMs) have become pivotal in recent research. However, during the inference process, LLMs still require substantial resources. In this paper, we propose CliqueParcel, a method designed to improve the efficiency of LLMs via prompt batching. Existing strategies to optimize inference efficiency often compromise on output quality, leading to a discounted output problem. This issue might result in reduced accuracy or outputs that are less detailed. CliqueParcel is our answer to this challenge. While ensuring accuracy and minimizing deviations from the original outputs (i.e., faithfulness), our method significantly improves efficiency during inference. To lay the groundwork, we first redefine efficiency measurements by excluding the reduction in running time due to shorter lengths. Then, we provide a comprehensive trade-off between efficiency and faithfulness to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Service-Oriented Architecture and Web Services