Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State

Peiying Zhu; Sidi Chang

arXiv:2605.06529·cs.AI·May 8, 2026

Market-Alignment Risk in Pricing Agents: Trace Diagnostics and Trace-Prior RL under Hidden Competitor State

Peiying Zhu, Sidi Chang

PDF

TL;DR

This paper identifies failure modes in revenue management RL agents under partial observability, diagnosing issues with scalar reward metrics and proposing a trace-based diagnostic and repair method called Trace-Prior RL.

Contribution

It introduces a trace-level diagnostic protocol and a Trace-Prior RL method that learns a market prior to improve distributional pricing policies under hidden competitor states.

Findings

01

Deterministic RL collapses uncertainty into shortcut behaviors.

02

Trace-Prior RL matches market metrics within seed-level uncertainty.

03

Higher action accuracy can worsen trace alignment when optimizing distributionally.

Abstract

Outcome metrics can certify the wrong behavior. We study this failure in a two-hotel revenue-management simulator where Hotel A trains an agent against a fixed rule-based revenue-management competitor, Hotel B. A standard learning agent can obtain near-reference revenue per available room (RevPAR) while failing to learn market-like yield management: it sells too aggressively, undercuts, or collapses to modal price buckets. We diagnose this as a Goodhart-style failure under partial observability. Hotel A cannot observe the competitor's remaining inventory, booking curve, or pricing rule, so the same Hotel A-visible state maps to multiple plausible Hotel B prices. Deterministic value-based RL and deterministic copying collapse this unresolved uncertainty into shortcut behavior. We introduce a trace-level diagnostic protocol using RevPAR, occupancy, ADR, full price-bucket distributions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.