# Guaranteed satisficing and finite regret: Analysis of a cognitive   satisficing value function

**Authors:** Akihiro Tamatsukuri, Tatsuji Takahashi

arXiv: 1812.05795 · 2025-04-16

## TL;DR

This paper introduces a risk-sensitive satisficing (RS) model for reinforcement learning that guarantees finding satisfactory actions and ensures finite regret in bandit problems, offering a practical alternative to optimality-focused methods.

## Contribution

The paper presents the RS model that guarantees satisficing solutions and finite regret, with theoretical proofs and empirical validation in bandit tasks.

## Key findings

- RS guarantees finding an action above the aspiration level.
- Expected regret of RS is finite under optimal aspiration levels.
- Numerical simulations confirm theoretical results and compare favorably with other algorithms.

## Abstract

As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a \textit{satisficing} strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing ($RS$) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the $K$-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that $RS$ is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of $RS$ is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of $RS$ with that of other representative algorithms for the $K$-armed bandit problems.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.05795/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1812.05795/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1812.05795/full.md

---
Source: https://tomesphere.com/paper/1812.05795