Incorporating Behavioral Constraints in Online AI Systems
Avinash Balakrishnan, Djallel Bouneffouf, Nicholas Mattei, Francesca, Rossi

TL;DR
This paper introduces a novel online learning agent that incorporates behavioral constraints learned from observation, ensuring decision-making aligns with ethical or regulatory standards without significantly sacrificing reward optimization.
Contribution
It proposes a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that extends contextual bandits to include learned behavioral constraints in online decision-making.
Findings
The agent effectively learns and adheres to behavioral constraints.
It maintains near-optimal reward performance while respecting constraints.
Theoretical regret bounds are established for the proposed algorithm.
Abstract
AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
