Constrained Policy Optimization for Provably Fair Order Matching
Zehua Cheng, Zhipeng Wang, Wei Dai, Wenhu Zhang, Vadzim Mahilny, David Shi, Elena Jia, Jiahao Sun

TL;DR
This paper introduces CPO-FOAM, a novel constrained policy optimization method for fair order matching that ensures provable fairness and stability while maintaining high throughput across diverse market regimes.
Contribution
It formulates fair order matching as a Constrained Markov Decision Process and develops a scalable, stable optimization algorithm with proven fairness and efficiency guarantees.
Findings
Recovers 95.9% of unconstrained throughput with 2.5% violation frequency on NASDAQ data.
Achieves 98.4% reward envelope capture at 3.2% CVF on crypto-asset data.
Scales sub-linearly to 8 constraints and improves reward by 2.1X on Safety-Gymnasium.
Abstract
Automated matching engines execute millions of orders per session, yet systematic asymmetries in latency, order size, and market access compound into persistent execution disparities that erode participant trust. We formulate provably fair order matching as a Constrained Markov Decision Process and propose CPO-FOAM (Constrained Policy Optimization with Feedback-Optimized Adaptive Margins). An inner loop computes an analytic trust-region step on the Fisher information manifold; a PID-controlled outer loop dynamically tightens safety margins, suppressing the sawtooth oscillations endemic to Lagrangian methods under non-stationary dynamics. Group fairness (demographic parity, equalized odds) enters the CMDP cost vector while individual Lipschitz fairness is enforced deterministically via spectral normalization. We prove BIBO stability and that the integral term drives steady-state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
