Offline Clustering of Preference Learning with Active-data Augmentation
Jingyuan Liu, Fatemeh Ghaffari, Xuchuang Wang, Xutong Liu, Mohammad Hajiesmaili, Carlee Joe-Wong

TL;DR
This paper introduces methods for offline preference learning that cluster user preferences to improve data utilization and actively select additional data, addressing data imbalance and diversity in user preferences.
Contribution
It proposes Off-C2PL for offline clustering of preferences with theoretical bounds and A2-Off-C2PL for active data augmentation based on learned clusters, a novel approach in preference learning.
Findings
Theoretical suboptimality bounds for Off-C2PL.
Active data augmentation improves preference learning efficiency.
Validated results on synthetic and real-world datasets.
Abstract
Preference learning from pairwise feedback is a widely adopted framework in applications such as reinforcement learning with human feedback and recommendations. In many practical settings, however, user interactions are limited or costly, making offline preference learning necessary. Moreover, real-world preference learning often involves users with different preferences. For example, annotators from different backgrounds may rank the same responses differently. This setting presents two central challenges: (1) identifying similarity across users to effectively aggregate data, especially under scenarios where offline data is imbalanced across dimensions, and (2) handling the imbalanced offline data where some preference dimensions are underrepresented. To address these challenges, we study the Offline Clustering of Preference Learning problem, where the learner has access to fixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Recommender Systems and Techniques
