Choice-Model-Assisted Q-learning for Delayed-Feedback Revenue Management
Owen Shen, Patrick Jaillet

TL;DR
This paper introduces a choice-model-assisted reinforcement learning approach for revenue management with delayed feedback, demonstrating theoretical convergence and empirical robustness in hotel booking simulations, with benefits and limitations depending on model accuracy.
Contribution
It proposes a fixed-choice-model-assisted Q-learning method for delayed feedback revenue management, providing convergence guarantees and empirical evaluation in real-world-like scenarios.
Findings
Converges to near-optimal Q-function with bounded error.
Shows robustness to parameter shifts in simulations.
Degrades under model misspecification, indicating bias risks.
Abstract
We study reinforcement learning for revenue management with delayed feedback, where a substantial fraction of value is determined by customer cancellations and modifications observed days after booking. We propose \emph{choice-model-assisted RL}: a calibrated discrete choice model is used as a fixed partial world model to impute the delayed component of the learning target at decision time. In the fixed-model deployment regime, we prove that tabular Q-learning with model-imputed targets converges to an neighborhood of the optimal Q-function, where summarizes partial-model error, with an additional sampling term. Experiments in a simulator calibrated from 61{,}619 hotel bookings (1{,}088 independent runs) show: (i) no statistically detectable difference from a maturity-buffer DQN baseline in stationary settings; (ii) positive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management · Advanced Queuing Theory Analysis · Consumer Market Behavior and Pricing
