Expert-Guided Extinction of Toxic Tokens for Debiased Generation
Xueyao Sun, Kaize Shi, Haoran Tang, Guandong Xu, Qing Li

TL;DR
This paper introduces EXPOSED, a novel method that effectively reduces social bias in large language models by suppressing toxic tokens without extensive data or complex prompting, improving fairness in generated outputs.
Contribution
EXPOSED is a new expert-guided approach that constructs a debiasing expert from toxic data to suppress harmful tokens in LLM outputs, avoiding extensive fine-tuning or prompt engineering.
Findings
Significantly reduces social bias in LLM outputs
Balances fairness and generation quality effectively
Outperforms existing baselines on fairness benchmarks
Abstract
Large language models (LLMs) can elicit social bias during generations, especially when inference with toxic prompts. Controlling the sensitive attributes in generation encounters challenges in data distribution, generalizability, and efficiency. Specifically, fine-tuning and retrieval demand extensive unbiased corpus, while direct prompting requires meticulously curated instructions for correcting the output in multiple rounds of thoughts but poses challenges on memory and inference latency. In this work, we propose the Expert-Guided Extinction of Toxic Tokens for Debiased Generation (EXPOSED) to eliminate the undesired harmful outputs for LLMs without the aforementioned requirements. EXPOSED constructs a debiasing expert based on the abundant toxic corpus to expose and elicit the potentially dangerous tokens. It then processes the output to the LLMs and constructs a fair distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdditive Manufacturing and 3D Printing Technologies
