Multi-Task Inference: Can Large Language Models Follow Multiple   Instructions at Once?

Guijin Son; Sangwon Baek; Sangdae Nam; Ilgyun Jeong and; Seungone Kim

arXiv:2402.11597·cs.CL·June 7, 2024·2 cites

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

Guijin Son, Sangwon Baek, Sangdae Nam, Ilgyun Jeong and, Seungone Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper evaluates whether large language models can efficiently handle multiple instructions simultaneously, introducing the MTI Bench to measure multi-task inference performance and demonstrating potential improvements in efficiency and accuracy.

Contribution

The authors introduce the MTI Bench, a comprehensive benchmark for multi-task inference, and analyze the performance and efficiency of state-of-the-art LLMs in this setting.

Findings

01

Multi-task inference reduces total inference time by 1.46 times.

02

State-of-the-art LLMs show up to 12.4% performance improvement with multi-task inference.

03

Multi-task inference can enhance both efficiency and accuracy of LLMs.

Abstract

Large language models (LLMs) are typically prompted to follow a single instruction per inference call. In this work, we analyze whether LLMs also hold the capability to handle multiple instructions simultaneously, denoted as Multi-Task Inference. For this purpose, we introduce the MTI Bench(Multi-Task Inference Benchmark), a comprehensive evaluation benchmark encompassing 5,000 instances across 25 tasks. Each task in the MTI Bench involves 2 to 3 sub-tasks. As expected, we first demonstrate that Multi-Task Inference reduces the total inference time by 1.46 times in average since it does not require multiple inference calls. Interestingly, contrary to the expectation that LLMs would perform better when tasks are divided, we find that state-of-the-art LLMs, such as Llama-2-Chat-70B and GPT-4, show up to 7.3% and 12.4% improved performance with Multi-Task Inference compared to Single-Task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guijinson/mti-bench
pytorchOfficial

Videos

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?· underline

Taxonomy

TopicsText Readability and Simplification

MethodsLinear Layer · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Softmax · Multi-Head Attention · Layer Normalization · Dropout · Residual Connection