Intrinsic Credit Assignment for Long Horizon Interaction

Ilze Amanda Auzina; Joschka Str\"uber; Sergio Hern\'andez-Guti\'errez; Shashwat Goel; Ameya Prabhu; Matthias Bethge

arXiv:2602.12342·cs.LG·February 16, 2026

Intrinsic Credit Assignment for Long Horizon Interaction

Ilze Amanda Auzina, Joschka Str\"uber, Sergio Hern\'andez-Guti\'errez, Shashwat Goel, Ameya Prabhu, Matthias Bethge

PDF

Open Access

TL;DR

This paper introduces { extbackslash}Delta Belief-RL, a reinforcement learning method that uses intrinsic belief changes to assign credit over long horizons, improving information-seeking and out-of-distribution performance.

Contribution

It presents a scalable training strategy leveraging intrinsic belief-based rewards for long-horizon navigation, outperforming outcome-based rewards and generalizing across tasks.

Findings

01

Outperforms purely outcome-based rewards in various tasks.

02

Improves with longer test-time interactions beyond training horizon.

03

Enhances interaction efficiency on Pass@k metrics.

Abstract

How can we train agents to navigate uncertainty over long horizons? In this work, we propose {\Delta}Belief-RL, which leverages a language model's own intrinsic beliefs to reward intermediate progress. Our method utilizes the change in the probability an agent assigns to the target solution for credit assignment. By training on synthetic interaction data, {\Delta}Belief-RL teaches information-seeking capabilities that consistently outperform purely outcome-based rewards for Reinforcement Learning, with improvements generalizing to out-of-distribution applications ranging from customer service to personalization. Notably, the performance continues to improve as we scale test-time interactions beyond the training horizon, with interaction-efficiency increasing even on Pass@k metrics. Overall, our work introduces a scalable training strategy for navigating uncertainty over a long-horizon,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)