CliqueParcel: An Approach For Batching LLM Prompts That Jointly Optimizes Efficiency And Faithfulness
Jiayi Liu, Tinghan Yang, Jennifer Neville

TL;DR
CliqueParcel is a novel batching method for LLM prompts that enhances inference efficiency while maintaining output faithfulness, addressing the common trade-off between speed and quality in large language model applications.
Contribution
This paper introduces CliqueParcel, a new prompt batching approach that optimizes inference efficiency without sacrificing output accuracy or faithfulness, filling a gap in existing methods.
Findings
Significantly improves inference efficiency on multiple datasets.
Maintains high faithfulness and accuracy in outputs.
Provides a comprehensive analysis of efficiency-faithfulness trade-offs.
Abstract
Large language models (LLMs) have become pivotal in recent research. However, during the inference process, LLMs still require substantial resources. In this paper, we propose CliqueParcel, a method designed to improve the efficiency of LLMs via prompt batching. Existing strategies to optimize inference efficiency often compromise on output quality, leading to a discounted output problem. This issue might result in reduced accuracy or outputs that are less detailed. CliqueParcel is our answer to this challenge. While ensuring accuracy and minimizing deviations from the original outputs (i.e., faithfulness), our method significantly improves efficiency during inference. To lay the groundwork, we first redefine efficiency measurements by excluding the reduction in running time due to shorter lengths. Then, we provide a comprehensive trade-off between efficiency and faithfulness to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Service-Oriented Architecture and Web Services
