The Impact of Inference Acceleration on Bias of LLMs

Elisabeth Kirsten; Ivan Habernal; Vedant Nanda; Muhammad Bilal Zafar

arXiv:2410.22118·cs.CL·June 9, 2025

The Impact of Inference Acceleration on Bias of LLMs

Elisabeth Kirsten, Ivan Habernal, Vedant Nanda, Muhammad Bilal Zafar

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how inference acceleration techniques like quantization and pruning affect demographic bias in Large Language Models, revealing complex and unpredictable bias changes that require careful evaluation.

Contribution

It is the first comprehensive analysis of how inference acceleration strategies impact bias in LLMs, emphasizing the need for bias assessment post-optimization.

Findings

01

Bias changes significantly after acceleration methods

02

Bias effects are complex and model-dependent

03

Evaluation of bias must be case-by-case

Abstract

Last few years have seen unprecedented advances in capabilities of Large Language Models (LLMs). These advancements promise to benefit a vast array of application domains. However, due to their immense size, performing inference with LLMs is both costly and slow. Consequently, a plethora of recent work has proposed strategies to enhance inference efficiency, e.g., quantization, pruning, and caching. These acceleration strategies reduce the inference cost and latency, often by several factors, while maintaining much of the predictive performance measured via common benchmarks. In this work, we explore another critical aspect of LLM performance: demographic bias in model generations due to inference acceleration optimizations. Using a wide range of metrics, we probe bias in model outputs from a number of angles. Analysis of outputs before and after inference acceleration shows significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aisoc-lab/inference-acceleration-bias
noneOfficial

Videos

The Impact of Inference Acceleration on Bias of LLMs· underline

Taxonomy

TopicsDigital Rights Management and Security