When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making

Ali Raza Jafree; Konark Jain; Nick Firoozye

arXiv:2510.27334·q-fin.TR·November 3, 2025

When AI Trading Agents Compete: Adverse Selection of Meta-Orders by Reinforcement Learning-Based Market Making

Ali Raza Jafree, Konark Jain, Nick Firoozye

PDF

Open Access

TL;DR

This paper uses reinforcement learning within a Hawkes Limit Order Book model to analyze how high-frequency traders can exploit medium-frequency market makers, revealing adverse selection mechanisms and their impact on trading strategies.

Contribution

It introduces an RL-based market making framework that accounts for endogenous market impacts and demonstrates adverse selection against medium-frequency traders in a realistic market model.

Findings

01

RL market maker learns to exploit price drifts caused by meta-orders.

02

Adverse selection effects increase with high-frequency trading proliferation.

03

RL agent's profits do not significantly raise slippage costs for medium-frequency traders.

Abstract

We investigate the mechanisms by which medium-frequency trading agents are adversely selected by opportunistic high-frequency traders. We use reinforcement learning (RL) within a Hawkes Limit Order Book (LOB) model in order to replicate the behaviours of high-frequency market makers. In contrast to the classical models with exogenous price impact assumptions, the Hawkes model accounts for endogenous price impact and other key properties of the market (Jain et al. 2024a). Given the real-world impracticalities of the market maker updating strategies for every event in the LOB, we formulate the high-frequency market making agent via an impulse control reinforcement learning framework (Jain et al. 2025). The RL used in the simulation utilises Proximal Policy Optimisation (PPO) and self-imitation learning. To replicate the adverse selection phenomenon, we test the RL agent trading against a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Complex Systems and Time Series Analysis · stochastic dynamics and bifurcation