# MUX-PLMs: Data Multiplexing for High-throughput Language Models

**Authors:** Vishvak Murahari, Ameet Deshpande, Carlos E. Jimenez, Izhak Shafran,, Mingqiu Wang, Yuan Cao, Karthik Narasimhan

arXiv: 2302.12441 · 2023-05-24

## TL;DR

This paper introduces MUX-PLMs, a new class of high-throughput language models trained with data multiplexing, achieving 2-5x inference speedup with minimal performance loss across various tasks.

## Contribution

The paper develops novel multiplexing and demultiplexing modules enabling high-performance, high-throughput PLMs trained with data multiplexing, which are competitive with standard models.

## Key findings

- Achieves 2x inference speedup on multiple tasks.
- Achieves 5x inference speedup with minimal 1-4% performance drop.
- Demonstrates broad applicability across different NLP tasks.

## Abstract

The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms such as data multiplexing, offer a promising solution with a many-fold increase in throughput by performing inference for multiple inputs at the cost of a single input. Yet these approaches are not currently performant enough to be deployed in modern systems. We change that by developing MUX-PLMs, a class of high throughput pre-trained language models (PLMs) trained with data multiplexing, that can be fine-tuned for any downstream task to yield high-throughput high-performance. Our novel multiplexing and demultiplexing modules proficiently entangle and disentangle inputs, and enable high-performance high throughput \muxplms{} that are competitive with vanilla PLMs while achieving 2x/5x inference speedup with only a $1-4\%$ drop on a broad suite of tasks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.12441/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/2302.12441/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/2302.12441/full.md

---
Source: https://tomesphere.com/paper/2302.12441