AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization
Mukur Gupta, Nikhil Reddy Varimalla, Nicholas Deas, Melanie Subbiah, Kathleen McKeown

TL;DR
AdvSumm is an adversarial training framework that effectively reduces biases like name-nationality and political framing in text summarization models while maintaining high summarization quality.
Contribution
The paper introduces AdvSumm, a novel adversarial training method with a gradient-guided Perturber to mitigate biases in summarization models, improving fairness and robustness.
Findings
AdvSumm significantly reduces bias in summarization outputs.
It outperforms data augmentation techniques like back-translation.
Maintains high summarization quality despite bias mitigation.
Abstract
Large Language Models (LLMs) have achieved impressive performance in text summarization and are increasingly deployed in real-world applications. However, these systems often inherit associative and framing biases from pre-training data, leading to inappropriate or unfair outputs in downstream tasks. In this work, we present AdvSumm (Adversarial Summarization), a domain-agnostic training framework designed to mitigate bias in text summarization through improved generalization. Inspired by adversarial robustness, AdvSumm introduces a novel Perturber component that applies gradient-guided perturbations at the embedding level of Sequence-to-Sequence models, enhancing the model's robustness to input variations. We empirically demonstrate that AdvSumm effectively reduces different types of bias in summarization-specifically, name-nationality bias and political framing bias-without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Misinformation and Its Impacts · Text Readability and Simplification
