BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models
Junyu Zhang, Yipeng Kang, Jiong Guo, Jiayu Zhan, Junqi Wang

TL;DR
This paper investigates how large language models understand and utilize human values by analyzing their internal representations, demonstrating that they maintain structured, transferable value concepts bridging abstract understanding and concrete decision-making.
Contribution
The paper introduces an abstraction-grounding framework and empirical methods to analyze how LLMs represent and manipulate human values across different levels of abstraction.
Findings
Probes detect consistent value traces across abstract and concrete contexts.
Interventions on value representations causally influence concrete decisions.
Abstract value interpretations remain stable despite manipulations.
Abstract
Do large language models (LLMs) genuinely understand abstract concepts, or merely manipulate them as statistical patterns? We introduce an abstraction-grounding framework that decomposes conceptual understanding into three capacities: interpretation of abstract concepts (Abstract-Abstract, A-A), grounding of abstractions in concrete events (Abstract-Concrete, A-C), and application of abstract principles to regulate concrete decisions (Concrete-Concrete, C-C). Using human values as a testbed - given their semantic richness and centrality to alignment - we employ probing (detecting value traces in internal activations) and steering (modifying representations to shift behavior). Across six open-source LLMs and ten value dimensions, probing shows that diagnostic probes trained solely on abstract value descriptions reliably detect the same values in concrete event narratives and decision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods
