GEM: Generative Entropy-Guided Preference Modeling for Few-shot Alignment of LLMs

Yiyang Zhao; Huiyu Bai; Xuejiao Zhao

arXiv:2511.13007·cs.AI·November 18, 2025

GEM: Generative Entropy-Guided Preference Modeling for Few-shot Alignment of LLMs

Yiyang Zhao, Huiyu Bai, Xuejiao Zhao

PDF

Open Access 1 Video

TL;DR

GEM introduces an entropy-guided, self-optimization approach for aligning large language models with human preferences in low-resource and domain-specific settings, reducing reliance on large annotated datasets.

Contribution

It proposes a novel generative entropy-guided preference modeling framework that trains LLMs to internalize preference signals without extensive supervision.

Findings

01

Significant performance improvements on benchmarks with few-shot preference data.

02

Effective alignment in domain-specific tasks like medical dialogues and mathematical reasoning.

03

Demonstrates the viability of entropy-based self-evaluation for LLM alignment.

Abstract

Alignment of large language models (LLMs) with human preferences typically relies on supervised reward models or external judges that demand abundant annotations. However, in fields that rely on professional knowledge, such as medicine and law, such large-scale preference labels are often unachievable. In this paper, we propose a generative entropy-guided preference modeling approach named GEM for LLMs aligment at low-resource and domain-specific scenarios. Instead of training a discriminative reward model on preference data, we directly train the LLM to internalize a closed-loop optimization architecture that can extract and exploit the multi-dimensional, fine-grained cognitive signals implicit in human preferences. Specifically, our Cognitive Filtering module, based on entropy theory in decision making, first leverages Chain-of-Thought (CoT) prompting to generate diverse candidate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GEM: Generative Entropy-Guided Preference Modeling for Few-Shot Alignment of LLMs· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)