Learning in Position-Aware Multinomial Logit Bandits: From Multiplicative to General Position Effects
Xi Chen, Shibo Dai, Jiameng Lyu, Yuan Zhou

TL;DR
This paper develops regret-optimal algorithms for dynamic assortment and positioning in multinomial logit models, handling both multiplicative and general position effects, with theoretical guarantees and empirical validation.
Contribution
It introduces the first regret-optimal algorithms for joint assortment and position optimization under both multiplicative and general position effects models.
Findings
Algorithms achieve regret bounds matching theoretical lower bounds.
Proposed methods outperform existing benchmarks in synthetic and real data.
Efficient optimization routines enable practical deployment in modern platforms.
Abstract
We study the dynamic joint assortment selection and positioning problem, where the attraction of each product depends on both its intrinsic appeal and its display position under a Multinomial Logit (MNL) choice framework. Our study ranges from the multiplicative position effects model, in which each product's attraction is scaled by a position-specific factor, to a general position effects model assigning independent attraction parameters to every product--position pair to capture heterogeneous synergies. For both models, we design round-based learning algorithms that update decisions after every single feedback, and establish the first regret-optimal characterization. Besides, our round-based algorithms provide the prompt operations needed by modern platforms. For the multiplicative model, we develop a cross-position pairwise maximum likelihood estimator with a clipping mechanism, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
