SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language Models
Quentin Guimard, Federico Bartsch, Simone Caldarella, Rahaf Aljundi, Elisa Ricci, Massimiliano Mancini

TL;DR
This paper introduces SEM, a novel post-hoc debiasing method for vision-language models like CLIP, using sparse autoencoder latent space to disentangle and modulate bias without harming semantic content.
Contribution
SEM is the first approach to operate in a sparse autoencoder space for post-hoc debiasing, enabling precise bias removal while maintaining semantic fidelity.
Findings
SEM improves fairness in retrieval tasks across multiple datasets.
SEM enhances zero-shot classification fairness without degrading accuracy.
Sparse latent representations are effective for debiasing vision-language models.
Abstract
Models that bridge vision and language, such as CLIP, are key components of multimodal AI, yet their large-scale, uncurated training data introduce severe social and spurious biases. Existing post-hoc debiasing methods often operate directly in the dense CLIP embedding space, where bias and task-relevant information are highly entangled. This entanglement limits their ability to remove bias without degrading semantic fidelity. In this work, we propose Sparse Embedding Modulation (SEM), a post-hoc, zero-shot debiasing framework that operates in a Sparse Autoencoder (SAE) latent space. By decomposing CLIP text embeddings into disentangled features, SEM identifies and modulates bias-relevant neurons while preserving query-relevant ones. This enables more precise, non-linear interventions. Across four benchmark datasets and two CLIP backbones, SEM achieves substantial fairness gains in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
