Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image Classification and Generation
Victor Gallego

TL;DR
This paper introduces a fast adaptation method using Bradley-Terry preference models to personalize large multimodal models like CLIP and Stable Diffusion for specific human preferences with minimal data and computational resources.
Contribution
It develops a novel, efficient fine-tuning approach leveraging Bradley-Terry models to adapt multimodal models to individual preferences, requiring few examples and low computation.
Findings
Effective preference prediction as reward models
Improved image generation aligned with user preferences
Minimal data and computational requirements achieved
Abstract
Recently, large multimodal models, such as CLIP and Stable Diffusion have experimented tremendous successes in both foundations and applications. However, as these models increase in parameter size and computational requirements, it becomes more challenging for users to personalize them for specific tasks or preferences. In this work, we address the problem of adapting the previous models towards sets of particular human preferences, aligning the retrieved or generated images with the preferences of the user. We leverage the Bradley-Terry preference model to develop a fast adaptation method that efficiently fine-tunes the original model, with few examples and with minimal computing resources. Extensive evidence of the capabilities of this framework is provided through experiments in different domains related to multimodal text and image understanding, including preference prediction as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Text and Document Classification Technologies · Multimodal Machine Learning Applications
MethodsDiffusion · Contrastive Language-Image Pre-training
