TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure

Maxime Kawawa-Beaudan; Srijan Sood; Kassiani Papasotiriou; Daniel Borrajo; Manuela Veloso

arXiv:2602.23784·cs.LG·March 2, 2026

TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure

Maxime Kawawa-Beaudan, Srijan Sood, Kassiani Papasotiriou, Daniel Borrajo, Manuela Veloso

PDF

Open Access 3 Reviews

TL;DR

TradeFM is a large generative model trained on extensive trade data that captures market microstructure features and generalizes across different markets, enabling applications like synthetic data generation and trading strategies.

Contribution

We introduce TradeFM, a novel scale-invariant, multi-modal generative Transformer for market microstructure modeling, capable of zero-shot cross-market generalization.

Findings

01

TradeFM achieves 2-3x lower distributional error than baselines.

02

It reproduces key stylized facts of financial returns.

03

It generalizes well to out-of-distribution markets.

Abstract

Foundation models have transformed domains from language to genomics by learning general-purpose representations from large-scale, heterogeneous data. We introduce TradeFM, a 524M-parameter generative Transformer that brings this paradigm to market microstructure, learning directly from billions of trade events across >9K equities. To enable cross-asset generalization, we develop scale-invariant features and a universal tokenization scheme that map the heterogeneous, multi-modal event stream of order flow into a unified discrete sequence -- eliminating asset-specific calibration. Integrated with a deterministic market simulator, TradeFM-generated rollouts reproduce key stylized facts of financial returns, including heavy tails, volatility clustering, and absence of return autocorrelation. Quantitatively, TradeFM achieves 2-3x lower distributional error than Compound Hawkes baselines and…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

The choice of a good research topic, Foundation Model, is a significant advantage for the paper. A clear formulation of trade-flow modelling under partial observability, including an EW-VWAP mid-price estimator and a scale-invariant feature representation that aims to align assets across liquidity regimes. A universal tokenisation via mixed-radix composite trade tokens (vocab 16,384) plus conditioning signals (liquidity bins, Δp, IMP), which yields a compact tabular embedding pipeline. A closed-

Weaknesses

(1) Scaling evidence is missing for a foundation model claim. The paper reports a single model size (251M) and a single held-out perplexity (17.85). Please provide reproducible scaling curves across model sizes, data fractions, and compute (e.g., 50M/150M/251M/500M; 25/50/100% tokens), with monotonic trends and power-law fits. Without multi-point fits, the FM positioning is under-supported. (2) Novelty vs MaRS and strong baselines. The paper positions universality largely at the feature/tokeniz

Reviewer 02Rating 2Confidence 4

Strengths

- The research problem is critical and challenging in financial economics. - Design choices reflect market pragmatism. The model is learned from partial observations to mimic the realistic scenarios, which aligns with how practitioners actually see the market.

Weaknesses

- Tokenizer calibration on the first 30 days invites regime bias and drift. - Validation for the quality of synthetic data using stylized facts is not convincing enough. For example, the lack of autocorrelation and heavy tails are easy for many generative models to achieve, but their practical usefulness is limited in financial applications. - Baselines are limited. A zero-intelligence agent is a weak competitor. - There is no strong evidence that synthetic data improves downstream tasks.

Reviewer 03Rating 4Confidence 4

Strengths

- Universal representation learned by TradeFM seems useful in downstream applications. - The proposed approach directly works on raw data, without the need for human expertise.

Weaknesses

- The introduction of scale-invariant features in Section 5.3 is unclear for readers who are not familiar with this concept. Concrete, self-contained explanations and citations to prior works are needed. Personally, I prefer to involve some simple illustrations or examples to clarify this. - This paper only evaluates the fidelity of synthetic data generation using stylized facts. To evaluate the fidelity of synthetic time series, there are many other metrics, including goodness of fit and

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Complex Systems and Time Series Analysis · Stock Market Forecasting Methods