DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich   Paradigm for Direct Preference Optimization

Amitava Das; Suranjana Trivedy; Danush Khanna; Rajarshi Roy; Gurpreet; Singh; Basab Ghosh; Yaswanth Narsupalli; Vinija Jain; Vasu Sharma; Aishwarya; Naresh Reganti; Aman Chadha

arXiv:2501.03271·cs.LG·January 22, 2025

DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization

Amitava Das, Suranjana Trivedy, Danush Khanna, Rajarshi Roy, Gurpreet, Singh, Basab Ghosh, Yaswanth Narsupalli, Vinija Jain, Vasu Sharma, Aishwarya, Naresh Reganti, Aman Chadha

PDF

Open Access

TL;DR

DPO-Kernels introduces a kernel-enhanced, divergence-rich framework for direct preference optimization, significantly improving alignment of large language models with diverse values through richer transformations and adaptive selection.

Contribution

It proposes a novel kernel-based approach with multiple divergence options and data-driven selection, advancing the state-of-the-art in LLM alignment techniques.

Findings

01

Achieves state-of-the-art results on 12 datasets

02

Enhances robustness in factuality and safety

03

Improves instruction-following performance

Abstract

The rapid rise of large language models (LLMs) has unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations. We propose DPO-Kernels, which integrates kernel methods to address these issues through four key contributions: (i) Kernelized Representations with polynomial, RBF, Mahalanobis, and spectral kernels for richer transformations, plus a hybrid loss combining embedding-based and probability-based objectives; (ii) Divergence Alternatives (Jensen-Shannon, Hellinger, Renyi, Bhattacharyya, Wasserstein, and f-divergences) for greater stability; (iii) Data-Driven Selection metrics that automatically choose the best kernel-divergence pair; and (iv) a Hierarchical Mixture of Kernels for both local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Data Management and Algorithms

MethodsRadial Basis Function