Ethics2vec: aligning automatic agents and human preferences
Gianluca Bontempi

TL;DR
Ethics2Vec introduces a method to represent and compare AI decision-making strategies with human ethical values using vector embeddings, addressing the challenge of aligning AI systems with complex, incommensurable human values.
Contribution
The paper extends the Anything2Vec approach to ethics, proposing a vectorization method for automatic agents to assess alignment with human values.
Findings
Proposes Ethics2Vec for ethical alignment of AI agents.
Demonstrates vectorization of binary decision-making.
Extends approach to automatic control systems.
Abstract
Though intelligent agents are supposed to improve human experience (or make it more efficient), it is hard from a human perspective to grasp the ethical values which are explicitly or implicitly embedded in an agent behaviour. This is the well-known problem of alignment, which refers to the challenge of designing AI systems that align with human values, goals and preferences. This problem is particularly challenging since most human ethical considerations refer to \emph{incommensurable} (i.e. non-measurable and/or incomparable) values and criteria. Consider, for instance, a medical agent prescribing a treatment to a cancerous patient. How could it take into account (and/or weigh) incommensurable aspects like the value of a human life and the cost of the treatment? Now, the alignment between human and artificial values is possible only if we define a common space where a metric can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Logic, Reasoning, and Knowledge · Reinforcement Learning in Robotics
