Fairness Evaluation and Inference Level Mitigation in LLMs

Afrozah Nadeem; Mark Dras; Usman Naseem

arXiv:2510.18914·cs.CL·April 24, 2026·2 cites

Fairness Evaluation and Inference Level Mitigation in LLMs

Afrozah Nadeem, Mark Dras, Usman Naseem

PDF

TL;DR

This paper introduces a dynamic, reversible pruning framework for large language models that adaptively mitigates fairness issues during inference, maintaining coherence and multilingual capabilities in conversations.

Contribution

It proposes a novel inference-time, adaptive neuron masking method that dynamically adjusts bias mitigation in LLMs, unlike static prior approaches.

Findings

01

Improves fairness without sacrificing model coherence.

02

Enables dynamic, context-aware bias mitigation during inference.

03

Maintains multilingual and multi-turn dialogue performance.

Abstract

Large language models often display undesirable behaviors embedded in their internal representations, undermining fairness, inconsistency drift, amplification of harmful content, and the propagation of unwanted patterns during extended dialogue and conversations. Although training-time or data-centric methods attempt to reduce these effects, they are computationally expensive, irreversible once deployed, and slow to adapt to new conversational contexts. Pruning-based methods provide a flexible and transparent way to reduce bias by adjusting the neurons responsible for certain behaviors. However, most existing approaches are static; once a neuron is removed, the model loses the ability to adapt when the conversation or context changes. To address this, we propose a dynamic, reversible, pruning-based framework that detects context-aware neuron activations and applies adaptive masking to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.