DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization
Amitava Das, Suranjana Trivedy, Danush Khanna, Rajarshi Roy, Gurpreet, Singh, Basab Ghosh, Yaswanth Narsupalli, Vinija Jain, Vasu Sharma, Aishwarya, Naresh Reganti, Aman Chadha

TL;DR
DPO-Kernels introduces a kernel-enhanced, divergence-rich framework for direct preference optimization, significantly improving alignment of large language models with diverse values through richer transformations and adaptive selection.
Contribution
It proposes a novel kernel-based approach with multiple divergence options and data-driven selection, advancing the state-of-the-art in LLM alignment techniques.
Findings
Achieves state-of-the-art results on 12 datasets
Enhances robustness in factuality and safety
Improves instruction-following performance
Abstract
The rapid rise of large language models (LLMs) has unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations. We propose DPO-Kernels, which integrates kernel methods to address these issues through four key contributions: (i) Kernelized Representations with polynomial, RBF, Mahalanobis, and spectral kernels for richer transformations, plus a hybrid loss combining embedding-based and probability-based objectives; (ii) Divergence Alternatives (Jensen-Shannon, Hellinger, Renyi, Bhattacharyya, Wasserstein, and f-divergences) for greater stability; (iii) Data-Driven Selection metrics that automatically choose the best kernel-divergence pair; and (iv) a Hierarchical Mixture of Kernels for both local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Data Management and Algorithms
MethodsRadial Basis Function
