Signature Approach for Contextual Bandits with Nonlinear and Path-dependent Rewards
Xin Guo, Grace He, Xinyu Li

TL;DR
This paper introduces a signature-transform-based method for nonlinear, path-dependent contextual bandits, enabling efficient learning with theoretical guarantees and superior empirical performance.
Contribution
It proposes exttt{DisSigUCB}, a novel signature-based UCB algorithm with proven regret bounds for complex path-dependent reward functions.
Findings
exttt{DisSigUCB} outperforms classical bandit algorithms in nonlinear settings.
The method achieves a regret bound of ilde{ ext{O}}( extstyleig( ext{d}+mig)^{1/2}KT^{1/2}).
Synthetic and real-world experiments validate its effectiveness.
Abstract
We study contextual bandits with nonlinear and path-dependent rewards through a novel signature-transform-based approach. Leveraging the universal nonlinearity property of signatures, we approximate continuous path-dependent reward functionals by linear functionals in the signature space. This representation enables the use of efficient linear contextual bandit methods while preserving expressive sequential structure. Building on this framework, we propose \texttt{DisSigUCB}, a signature-based disjoint upper confidence bound (UCB) algorithm. Under boundedness and non-degeneracy assumptions, we prove a high-probability data-dependent sublinear regret bound of order \(\tilde{\mathcal O}(\sqrt{(d+m)KT})\) where \(d\) is the context dimension and \(m\) is the signature feature dimension. Synthetic experiments and numerical applications on temperature sensor monitoring, sleep-stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
