DSO: Direct Steering Optimization for Bias Mitigation
Lucas Monteiro Paes, Nivedha Sivakumar, Yinong Oliver Wang, Masha Fedzechkina, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff

TL;DR
This paper introduces DSO, a reinforcement learning-based method for inference-time bias mitigation in vision-language and language models, enabling controllable trade-offs between fairness and performance.
Contribution
DSO is the first to optimize steering transformations directly for bias mitigation, improving fairness-performance trade-offs in large models.
Findings
DSO outperforms existing methods in fairness-performance trade-offs.
DSO provides practitioners with inference-time control over bias and capabilities.
DSO achieves state-of-the-art results on both VLMs and LLMs.
Abstract
Generative models are often deployed to make decisions on behalf of users, such as vision-language models (VLMs) identifying which person in a room is a doctor to help visually impaired individuals. Yet, VLM decisions are influenced by the perceived demographic attributes of people in the input, which can lead to biased outcomes like failing to identify women as doctors. Moreover, when reducing bias leads to performance loss, users may have varying needs for balancing bias mitigation with overall model capabilities, highlighting the demand for methods that enable controllable bias reduction during inference. Activation steering is a popular approach for inference-time controllability that has shown potential in inducing safer behavior in large language models (LLMs). However, we observe that current steering methods struggle to correct biases, where equiprobable outcomes across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Ethics and Social Impacts of AI
