Aligning AI Agents via Information-Directed Sampling

Hong Jun Jeon; Benjamin Van Roy

arXiv:2410.14807·cs.LG·October 22, 2024

Aligning AI Agents via Information-Directed Sampling

Hong Jun Jeon, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper introduces a new bandit alignment framework addressing long-term AI alignment by balancing exploration of environment and human preferences, demonstrating the effectiveness of information-directed sampling over naive methods.

Contribution

It extends classic multi-armed bandit problems to include human preferences and costs, proposing an information-directed sampling approach for better AI alignment.

Findings

01

Naive exploration algorithms perform poorly in the alignment setting.

02

Information-directed sampling achieves lower regret in the toy problem.

03

Current algorithms like Thompson sampling are inadequate for this alignment task.

Abstract

The staggering feats of AI systems have brought to attention the topic of AI Alignment: aligning a "superintelligent" AI agent's actions with humanity's interests. Many existing frameworks/algorithms in alignment study the problem on a myopic horizon or study learning from human feedback in isolation, relying on the contrived assumption that the agent has already perfectly identified the environment. As a starting point to address these limitations, we define a class of bandit alignment problems as an extension of classic multi-armed bandit problems. A bandit alignment problem involves an agent tasked with maximizing long-run expected reward by interacting with an environment and a human, both involving details/preferences initially unknown to the agent. The reward of actions in the environment depends on both observed outcomes and human preferences. Furthermore, costs are associated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques

MethodsSoftmax · Attention Is All You Need