HAVA: Hybrid Approach to Value-Alignment through Reward Weighing for Reinforcement Learning

Kryspin Varys; Federico Cerutti; Adam Sobey; Timothy J. Norman

arXiv:2505.15011·cs.AI·May 22, 2025

HAVA: Hybrid Approach to Value-Alignment through Reward Weighing for Reinforcement Learning

Kryspin Varys, Federico Cerutti, Adam Sobey, Timothy J. Norman

PDF

Open Access 1 Repo

TL;DR

This paper introduces HAVA, a novel reinforcement learning method that combines explicit and implicit norm representations by monitoring agent compliance and adjusting rewards to promote value-aligned behavior.

Contribution

HAVA is the first approach to integrate explicit legal norms and implicit social norms into reinforcement learning through reward weighing based on agent reputation.

Findings

01

HAVA effectively promotes value-aligned policies in complex environments.

02

Combining explicit and implicit norms yields better alignment than using either alone.

03

Experiments demonstrate the importance of both norm types in reinforcement learning.

Abstract

Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness. The goal of value-alignment is to create agents that not only do their tasks but through their behaviours also promote these values. Many of the norms are written as laws or rules (legal / safety norms) but even more remain unwritten (social norms). Furthermore, the techniques used to represent these norms also differ. Safety / legal norms are often represented explicitly, for example, in some logical language while social norms are typically learned and remain hidden in the parameter space of a neural network. There is a lack of approaches in the literature that could combine these various norm representations into a single algorithm. We propose a novel method that integrates these norms into the reinforcement learning process. Our method monitors the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kvarys/HAVA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAssembly Line Balancing Optimization · Flexible and Reconfigurable Manufacturing Systems

MethodsSparse Evolutionary Training