Configurable Preference Tuning with Rubric-Guided Synthetic Data
V\'ictor Gallego

TL;DR
This paper introduces Configurable Preference Tuning (CPT), a framework that enables language models to adapt their behavior dynamically based on human-interpretable rubrics, using synthetic data for fine-tuning without retraining.
Contribution
CPT allows models to modulate outputs according to explicit preferences via synthetic, rubric-guided data, enhancing flexibility and nuance in human feedback modeling.
Findings
CPT enables dynamic behavior adjustment at inference time.
Rubric-guided synthetic data improves preference control.
Models show improved alignment with human-defined attributes.
Abstract
Models of human feedback for AI alignment, such as those underpinning Direct Preference Optimization (DPO), often bake in a singular, static set of preferences, limiting adaptability. This paper challenges the assumption of monolithic preferences by introducing Configurable Preference Tuning (CPT), a novel framework for endowing language models with the ability to dynamically adjust their behavior based on explicit, human-interpretable directives. CPT leverages synthetically generated preference data, conditioned on system prompts derived from structured, fine-grained rubrics that define desired attributes like writing style. By fine-tuning with these rubric-guided preferences, the LLM learns to modulate its outputs at inference time in response to the system prompt, without retraining. This approach not only offers fine-grained control but also provides a mechanism for modeling more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries
