SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma

TL;DR
SANER is a novel debiasing method for CLIP that removes societal bias by neutralizing attribute information in text features without needing attribute annotations, outperforming existing techniques.
Contribution
Introduces SANER, a simple and effective debiasing approach that eliminates societal bias in CLIP without using attribute annotations or losing attribute-specific information.
Findings
SANER outperforms existing debiasing methods in experiments.
It effectively removes societal bias while preserving attribute-specific information.
SANER does not require attribute annotations during debiasing.
Abstract
Large-scale vision-language models, such as CLIP, are known to contain societal bias regarding protected attributes (e.g., gender, age). This paper aims to address the problems of societal bias in CLIP. Although previous studies have proposed to debias societal bias through adversarial learning or test-time projecting, our comprehensive study of these works identifies two critical limitations: 1) loss of attribute information when it is explicitly disclosed in the input and 2) use of the attribute annotations during debiasing process. To mitigate societal bias in CLIP and overcome these limitations simultaneously, we introduce a simple-yet-effective debiasing method called SANER (societal attribute neutralizer) that eliminates attribute information from CLIP text features only of attribute-neutral descriptions. Experimental results show that SANER, which does not require attribute…
Peer Reviews
Decision·ICLR 2025 Poster
The paper is very clearly written. Whenever an uncertainty comes up about a term footnotes help understanding. The analysis of debiasing techniques is very clearly presented that it could be used for a tutorial. The experimental setup is well done, the evaluation metrics such as measuring the difference between a uniform distribution and a potentially gender biased image generation seems adequate for the task. Showing generated images, i.e. with Stable Diffusion shows also a very immediate prac
The attribute groups will be limited to a specific set of defined attributes, which need to be agreed upon and could be debatable, e.g. "pregnant" (If I understand list C in the appendix correctly). There may also be missed attributes. This does not seem like a big issue but could be one in an adversarial setting. In general, the benefits of not needing a annotated dataset with attributes seems to be bought by needing a general list of attributes. This may not be a weakness per se but it would
- The problems of existing methods are straightforward, and the authors conduct several experiments to verify these phenomena. - The proposed method has a good performance. - The experiments are conducted on both text-to-image retrieval and generation tasks.
- Could the authors provide ARL results to verify the loss of attribute information? Besides, do other existing methods, such as Mapper and Prompt tuning-based debiasing, also have the lossing attribute information when debaising? - Although effective in debiasing CLIP, the proposed method appears ad-hoc for this specific task and lacks major technical contributions, as most components seem to be existing technologies or tricks. Therefore, this paper may not have significant influence or provide
The two identified challenges, especially the loss of attribute information, sound critical and interesting. It seems like the pipeline only involves lightweight training, as only the debiasing layer is trained. The pipeline is evaluated on two different downstream tasks.
1. Although replacing the protected attribute words with an attribute-neutral word is a sound neutralization method, it requires a comprehensive list of the protected attributes (as listed in Appx. C). However, in real practice, and especially in non-binary cases, it will be very hard or nearly impossible to have a complete list. Further, since the "attribute annotation-free debiasing loss" relies on a set of attribute-specific descriptions, creating the attribute-specific descriptions set may l
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsContrastive Language-Image Pre-training
