When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State

Peiying Zhu; Sidi Chang

arXiv:2605.18580·cs.AI·May 19, 2026

When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State

Peiying Zhu, Sidi Chang

PDF

TL;DR

This paper introduces a trace-based evaluation paradigm to assess whether reinforcement learning agents truly preserve behavioral discipline, especially under hidden states, beyond just achieving business KPIs.

Contribution

It proposes a new evaluation framework that isolates behavioral fidelity from outcome metrics, including diagnostics, ablations, and transfer tests, demonstrated on hotel pricing and bidding tasks.

Findings

01

Reward-only PPO variants fail to maintain trace alignment.

02

Hidden states reduce label uncertainty and improve evaluation.

03

Trace-prior policies better preserve price and bid distributions.

Abstract

Outcome-only evaluation can certify economically unsafe agents: a policy can hit a business KPI while violating deployable behavioral discipline. In hotel pricing with hidden competitor state, a learner can achieve plausible revenue per available room while failing to preserve the rate discipline of a rule-based revenue-management competitor. We introduce discipline stability, a trace-based evaluation paradigm: define the benchmark behavior, restrict observations to the deployment regime, induce trace diagnostics from failure, separate mechanisms with ablations, and test transfer and deployment. Across a two-hotel benchmark and a compact hidden-budget bidding task, reward-only PPO variants miss trace alignment; revealing hidden state reduces label uncertainty; deterministic copy collapses uncertainty; and trace-prior or corrected history policies better preserve price or bid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.