Self-supervised Attribute-aware Dynamic Preference Ranking Alignment

Hongyu Yang; Qi Zhao; Zhenhua hu; Rui Li

arXiv:2502.12189·cs.CL·February 19, 2025

Self-supervised Attribute-aware Dynamic Preference Ranking Alignment

Hongyu Yang, Qi Zhao, Zhenhua hu, Rui Li

PDF

Open Access

TL;DR

This paper introduces SeAdpra, a self-supervised, attribute-aware method for dynamic preference ranking that improves list-level alignment in response generation without relying on costly human annotations.

Contribution

It proposes a novel self-supervised approach using Attribute-Perceptual Distance Factors for fine-grained preference learning and introduces scalable evaluation metrics and a challenging dataset.

Findings

01

SeAdpra outperforms existing methods on multiple datasets.

02

It achieves better alignment with human preferences.

03

The approach demonstrates strong generalizability across domains.

Abstract

Reinforcement Learning from Human Feedback and its variants excel in aligning with human intentions to generate helpful, harmless, and honest responses. However, most of them rely on costly human-annotated pairwise comparisons for supervised alignment, which is not suitable for list-level scenarios, such as community question answering. Additionally, human preferences are influenced by multiple intrinsic factors in responses, leading to decision-making inconsistencies. Therefore, we propose \textbf{Se}lf-supervised \textbf{A}ttribute-aware \textbf{d}ynamic \textbf{p}reference \textbf{ra}nking, called \shortname. \ It quantifies preference differences between responses based on Attribute-Perceptual Distance Factors (APDF) and dynamically determines the list-wise alignment order. Furthermore, it achieves fine-grained preference difference learning and enables precise alignment with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Rough Sets and Fuzzy Logic