Query-OPT: Optimizing Inference of Large Language Models via Multi-Query   Instructions in Meeting Summarization

Md Tahmid Rahman Laskar; Elena Khasanova; Xue-Yong Fu; Cheng Chen,; Shashi Bhushan TN

arXiv:2403.00067·cs.CL·October 22, 2024·1 cites

Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization

Md Tahmid Rahman Laskar, Elena Khasanova, Xue-Yong Fu, Cheng Chen,, Shashi Bhushan TN

PDF

Open Access 1 Video

TL;DR

This paper explores multi-query prompting for large language models to reduce inference costs in meeting summarization, demonstrating that combining queries can maintain performance while saving resources.

Contribution

It introduces a multi-query prompting approach for LLMs in meeting summarization, analyzing its effectiveness and cost-efficiency across various models.

Findings

01

Multi-query prompting reduces inference costs significantly.

02

Closed-source LLMs achieve higher reliability in expected formats.

03

Open-source LLMs lag in response reliability, except some 7B models.

Abstract

This work focuses on the task of query-based meeting summarization in which the summary of a context (meeting transcript) is generated in response to a specific query. When using Large Language Models (LLMs) for this task, usually a new call to the LLM inference endpoint/API is triggered for each new query, even if the context stays the same. However, repeated calls to the LLM inference endpoints would significantly increase the costs of using them in production, making LLMs impractical for many real-world use cases. To address this problem, in this paper, we investigate whether combining the queries for the same input context in a single prompt to minimize repeated calls can be successfully used in meeting summarization. In this regard, we conduct extensive experiments by comparing the performance of various popular LLMs: GPT-4, Gemini, Claude-3, LLaMA-2, Mistral, Phi-3, and Qwen-2 in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Layer Normalization · Dropout · Softmax · Dense Connections · Label Smoothing · Adam