Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna Khodabandeh, Ali Rasekh, Mahyar JafariNodeh,, Sepehr kazemi, Simon Gottschalk

TL;DR
This paper introduces a novel preference optimization approach to enhance contrastive learning models, improving robustness, fairness, and alignment with human preferences, especially against typographic attacks and gender bias.
Contribution
The paper pioneers the application of preference optimization methods to contrastive learning, improving model robustness and bias mitigation beyond traditional techniques.
Findings
Models trained with preference optimization outperform standard contrastive models.
Enhanced robustness against typographic attacks demonstrated.
Improved disentanglement of gender concepts and bias reduction.
Abstract
Contrastive learning models have demonstrated impressive abilities to capture semantic similarities by aligning representations in the embedding space. However, their performance can be limited by the quality of the training data and its inherent biases. While Preference Optimization (PO) methods such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) have been applied to align generative models with human preferences, their use in contrastive learning has yet to be explored. This paper introduces a novel method for training contrastive learning models using different PO methods to break down complex concepts. Our method systematically aligns model behavior with desired preferences, enhancing performance on the targeted task. In particular, we focus on enhancing model robustness against typographic attacks and inductive biases, commonly seen in…
Peer Reviews
Decision·ICLR 2025 Poster
- This paper appears to be the first paper to apply preference optimisation to contrastive models, and presents an interesting use of SVD to control model behaviour. - Optimising robustness and mitigating (gender) biases are of significant interest, especially in high-risk domains. - The evaluation results suggest comparable and often better performance than alternative approaches in improving robustness while enabling a (to some degree) interpretable intervention technique. - The paper is we
- Despite improving robustness over baseline methods in some datasets, none of the methods consistently outperforms other methods (see Table 1). - The baseline methods, PAINT and Defense-prefix, and their differences to the proposed method are not explained in the paper. Minor Comments: - Line 23: Incomplete sentence „Our experiments We demonstrate“. - Line 256: Comma instead of dot used. - Line 258: Comma should be a dot, and dot should be a comma. - Line 289: „this“ -> „This“ - The di
Originality: This is the first work to improve contrastive learning models through Preference Optimization. The idea of leveraging true labels and typographic labels for preferences, instead of curating a separate preference set from human annotation, is novel and interesting. Clarity: This paper is well-written and has very clear motivations, backgrounds, methods, and experiments. Significance: The topic of aligning human preferences in contrastive learning is impactful, as models like CLIP
Significance: this paper relies on a preference dataset, which requires heavy annotations and the preference set will be very small compared to the training set of CLIP. Also, the preference would be very task-specific (e.g., typographic or gender), limiting the generalizability of the approach to new, unseen attacks or biases. Quality: the inclusion of SVD makes it much slower to fine-tune on a larger scale. Also, the experiments focus on controlled, relatively smaller-scale datasets (the larg
1. The proposed method is simple yet effective. 2. The authors provide a new perspective on IPO and DPO concerning the representation space learned by CLIP. 3. The alignment controllability through $t$ is effective. 4. The background and motivation are well-organized.
1. Clarity needs improvement. * $\mathcal{L}_{pref}$ in (10) appears without a definition. In Corollary 3.2, it is assumed to be either the DPO loss or IPO loss, while the experiments further include the case of KTO loss. * In (9), $\mathcal{I}_{ref}$ is frozen and has no trainable parameters, contributing solely to per-example weighting when substituted in (5), (6), and (7). It is recommended to clarify this in advance. * In Fig.1, $\mathcal{L}_{pref}$ is computed with the given tri
Code & Models
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Constraint Satisfaction and Optimization
MethodsContrastive Learning · Focus · ALIGN · Contrastive Language-Image Pre-training · Parrot optimizer: Algorithm and applications to medical problems
