VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment in Large Language Models
Woojin Kim, Sieun Hyeon, Jusang Oh, Jaeyoung Do

TL;DR
This paper introduces VALUEFLOW, a comprehensive framework for extracting, evaluating, and steering large language models' alignment with human values, emphasizing hierarchical structure, calibrated intensity, and multi-value control.
Contribution
VALUEFLOW is the first unified system integrating hierarchical value embeddings, a large-scale value intensity database, and an intensity evaluator for LLM alignment.
Findings
Identified asymmetries in steerability across models and value theories.
Developed a scalable infrastructure for value intensity evaluation and control.
Conducted large-scale analysis across ten models and four value theories.
Abstract
Aligning Large Language Models (LLMs) with the diverse spectrum of human values remains a central challenge: preference-based methods often fail to capture deeper motivational principles. Value-based approaches offer a more principled path, yet three gaps persist: extraction often ignores hierarchical structure, evaluation detects presence but not calibrated intensity, and the steerability of LLMs at controlled intensities remains insufficiently understood. To address these limitations, we introduce VALUEFLOW, the first unified framework that spans extraction, evaluation, and steering with calibrated intensity control. The framework integrates three components: (i) HIVES, a hierarchical value embedding space that captures intra- and cross-theory value structure; (ii) the Value Intensity DataBase (VIDB), a large-scale resource of value-labeled texts with intensity estimates derived from…
Peer Reviews
Decision·Submitted to ICLR 2026
- The ranking-based value evaluation is a novel and timely contribution. It's a promising direction to overcome the reliability and consistency issue of prior evaluation methods. - Unifying heterogeneous value theories is an interesting and pioneering attempt. - The large-scale intensity database, upon its open-source, is a significant contribution to the community.
The methodological section is hard to follow. For example, it is unclear what the motivation and theoretical basis are for the two-stage training process (Section 4.3). How does the unified taxonomy contribute to the value evaluation? What is the relationship between the anchors in Section 4.3 and those in Section 5.2? In Figure 3, how are parts (a) and (b) used together synergistically? What is the theoretical justification is for using the Plackett–Luce model among all possible scales? Is th
1. This paper proposes VALUEFLOW, a unified framework spanning value extraction, evaluation and steering in LLMs, allowing for end-to-end steerable value alignment. 2. To address the challenge of value extraction, it constructs a HIVES method to unify heterogeneous value theories. 3. Accounting for the intensity of values and instability of current rating-based evaluations, this paper builds a value-intensity database and designs a ranking-based evaluation method of intensity. 4. Some experiment
1. Baselines of value extraction on open-ended conversational contexts are largely ignored both in the Introduction part (Line 52) and Related Work part. I think there are some works on this task. 2. The whole method needs better clarification: - A structural algorithm is desired to formulate the whole framework, especially how the hierarchical value embedding space is built. - More descriptions are required for the Sec 4.3 Two Stage Training Process, what are the inputs and what are the outputs
1. Problem Focus and Contribution Scope: - Addresses three critical gaps in value alignment research: extraction lacks hierarchical structure, evaluation detects presence but not intensity, and steerability remains insufficiently understood - the problem formulation is clear and important. - First to propose "steerability with intensity," extending value alignment from directional control to graded intensity control, opening a new dimension for pluralistic value alignment. 2. Method Mec
- Technical Correctness: Plackett-Luce model assumes Independence of Irrelevant Alternatives (IIA), but value judgments may exhibit context-dependent effects; the paper does not discuss robustness when this assumption is violated. - Evaluation Scope: Experiments mainly focus on "short-term prompt-driven value steering" and do not explore the stability of value expression in long-term dialogues (e.g., whether the model deviates from the target intensity after multi-turn interactions). They also
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Natural Language Processing Techniques
