OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Alignment for LLM-Based Multi-Agent Systems
Kun Liu, Liqun Chen

TL;DR
This paper introduces Out-of-Money Reinforcement Learning (OOM-RL), a novel alignment approach for multi-agent systems in financial markets that uses economic penalties to ensure robustness and reduce model sycophancy.
Contribution
The paper presents a new objective alignment paradigm deploying agents in live markets, demonstrating its effectiveness through a 20-month empirical study and a strict test-driven workflow.
Findings
Final system achieved an annualized Sharpe ratio of 2.06.
OOM-RL reduced model hallucinations and overfitting.
System evolved from high-turnover to a stable, liquidity-aware architecture.
Abstract
The alignment of Multi-Agent Systems (MAS) for autonomous software engineering is constrained by evaluator epistemic uncertainty. Current paradigms, such as Reinforcement Learning from Human Feedback (RLHF) and AI Feedback (RLAIF), frequently induce model sycophancy, while execution-based environments suffer from adversarial "Test Evasion" by unconstrained agents. In this paper, we introduce an objective alignment paradigm: \textbf{Out-of-Money Reinforcement Learning (OOM-RL)}. By deploying agents into the non-stationary, high-friction reality of live financial markets, we utilize critical capital depletion as an un-hackable negative gradient. Our longitudinal 20-month empirical study (July 2024 -- February 2026) chronicles the system's evolution from a high-turnover, sycophantic baseline to a robust, liquidity-aware architecture. We demonstrate that the undeniable ontological…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
