A Contextual Combinatorial Bandit Approach to Negotiation
Yexin Li, Zhancun Mu, Siyuan Qi

TL;DR
This paper presents NegUCB, a novel contextual combinatorial bandit method for negotiation that effectively manages exploration, large action spaces, and partial feedback, outperforming existing approaches in various negotiation tasks.
Contribution
It introduces a comprehensive formulation and a new algorithm, NegUCB, addressing exploration, large action spaces, and partial observations in negotiation using bandit techniques.
Findings
NegUCB achieves sub-linear regret bounds.
The approach outperforms baseline methods in three negotiation tasks.
It effectively handles partial feedback and complex reward functions.
Abstract
Learning effective negotiation strategies poses two key challenges: the exploration-exploitation dilemma and dealing with large action spaces. However, there is an absence of learning-based approaches that effectively address these challenges in negotiation. This paper introduces a comprehensive formulation to tackle various negotiation problems. Our approach leverages contextual combinatorial multi-armed bandits, with the bandits resolving the exploration-exploitation dilemma, and the combinatorial nature handles large action spaces. Building upon this formulation, we introduce NegUCB, a novel method that also handles common issues such as partial observations and complex reward functions in negotiation. NegUCB is contextual and tailored for full-bandit feedback without constraints on the reward functions. Under mild assumptions, it ensures a sub-linear regret upper bound. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Artificial Intelligence in Games
