Signature Approach for Contextual Bandits with Nonlinear and Path-dependent Rewards

Xin Guo; Grace He; Xinyu Li

arXiv:2605.10313·cs.LG·May 12, 2026

Signature Approach for Contextual Bandits with Nonlinear and Path-dependent Rewards

Xin Guo, Grace He, Xinyu Li

PDF

TL;DR

This paper introduces a signature-transform-based method for nonlinear, path-dependent contextual bandits, enabling efficient learning with theoretical guarantees and superior empirical performance.

Contribution

It proposes exttt{DisSigUCB}, a novel signature-based UCB algorithm with proven regret bounds for complex path-dependent reward functions.

Findings

01

exttt{DisSigUCB} outperforms classical bandit algorithms in nonlinear settings.

02

The method achieves a regret bound of ilde{ ext{O}}( extstyleig( ext{d}+mig)^{1/2}KT^{1/2}).

03

Synthetic and real-world experiments validate its effectiveness.

Abstract

We study contextual bandits with nonlinear and path-dependent rewards through a novel signature-transform-based approach. Leveraging the universal nonlinearity property of signatures, we approximate continuous path-dependent reward functionals by linear functionals in the signature space. This representation enables the use of efficient linear contextual bandit methods while preserving expressive sequential structure. Building on this framework, we propose \texttt{DisSigUCB}, a signature-based disjoint upper confidence bound (UCB) algorithm. Under boundedness and non-degeneracy assumptions, we prove a high-probability data-dependent sublinear regret bound of order \(\tilde{\mathcal O}(\sqrt{(d+m)KT})\) where \(d\) is the context dimension and \(m\) is the signature feature dimension. Synthetic experiments and numerical applications on temperature sensor monitoring, sleep-stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.