Markov Decision Process modeled with Bandits for Sequential Decision   Making in Linear-flow

Wenjun Zeng; Yi Liu

arXiv:2107.00204·cs.LG·March 18, 2022·1 cites

Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow

Wenjun Zeng, Yi Liu

PDF

Open Access

TL;DR

This paper introduces a novel MDP framework with Bandits for sequential decision making in linear-flow marketing scenarios, leveraging Thompson sampling for efficient action allocation and demonstrating superior performance in simulations.

Contribution

The paper formulates a new MDP with Bandits approach for linear-flow sequential decision making, combining Thompson sampling with dynamic programming for improved efficiency and robustness.

Findings

01

Proposed method outperforms Q-learning and independent Bandits in simulations.

02

The approach is most robust to changes in interdependence strength across pages.

03

Leverages Thompson sampling's exploration-exploitation balance effectively.

Abstract

For marketing, we sometimes need to recommend content for multiple pages in sequence. Different from general sequential decision making process, the use cases have a simpler flow where customers per seeing recommended content on each page can only return feedback as moving forward in the process or dropping from it until a termination state. We refer to this type of problems as sequential decision making in linear--flow. We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix. At recommendation time, we use Thompson sampling (TS) to sample the transition probabilities and allocate the best series of actions with analytical solution through exact dynamic programming. The way that we formulate the problem allows us to leverage TS's efficiency in balancing exploration and exploitation and Bandit's convenience in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Optimization and Search Problems

MethodsQ-Learning