Detecting Multilevel Manipulation from Limit Order Book via Cascaded Contrastive Representation Learning

Yushi Lin; Peng Yang

arXiv:2508.17086·q-fin.CP·October 13, 2025

Detecting Multilevel Manipulation from Limit Order Book via Cascaded Contrastive Representation Learning

Yushi Lin, Peng Yang

PDF

Open Access 5 Reviews

TL;DR

This paper introduces a novel contrastive learning framework that effectively detects complex multilevel spoofing manipulations in limit order books, significantly improving detection accuracy and providing insights into hierarchical market anomalies.

Contribution

It proposes a cascaded LOB representation combined with supervised contrastive learning, advancing the detection of covert multilevel market manipulations beyond existing single-level approaches.

Findings

01

Achieves state-of-the-art detection performance with Transformer models

02

Improves robustness and accuracy across diverse models

03

Provides systematic analysis of multilevel manipulation patterns

Abstract

Trade-based manipulation (TBM) undermines the fairness and stability of financial markets drastically. Spoofing, one of the most covert and deceptive TBM strategies, exhibits complex anomaly patterns across multilevel prices, while often being simplified as a single-level manipulation. These patterns are usually concealed within the rich, hierarchical information of the Limit Order Book (LOB), which is challenging to leverage due to high dimensionality and noise. To address this, we propose a representation learning framework combining a cascaded LOB representation architecture with supervised contrastive learning. Extensive experiments demonstrate that our framework consistently improves detection performance across diverse models, with Transformer-based architectures achieving state-of-the-art results. In addition, we conduct systematic analyses and ablation studies to investigate…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 5

Strengths

+ Manipulation detection is an interesting and practically important problem for financial markets, with clear societal impact. + Presents a representation-learning framework tailored to LOB data, aiming to leverage multilevel (depth >1) structure during training.

Weaknesses

- Technical novelty is limited: the sequence model is essentially a Transformer, however the LOB’s rich information is not clearly captured the behavioral patterns in trade manipulation. - The two-stage pipeline relies heavily on label quality, which raises concerns about robustness and practical deployability. - Experimental design mainly contrasts different representation learners under the same anomaly detectors; this only shows representation quality indirectly. It remains unclear how the me

Reviewer 02Rating 2Confidence 3

Strengths

- The paper is well written. Everything is very clearly explained, with additional details available in comprehensive appendices. Motivation and background are well sourced with many citations supporting the main claims. - The proposed improvements are interesting and seems to meaningfully improve upon previous work in this domain. - Results show that the proposed encoding for multi-level LOB data significantly improves detection when used in conjunction with previous state of the art contrastiv

Weaknesses

The following are, in my opinion, the main weaknesses of the paper, roughly in order of importance. ### 1. Scope and Novelty This is fundamentally an applied paper that would seem to be more suitable for a conference with an applied track or a domain specific conference or journal. While the paper is well written, the problem being solved has a very narrow scope within a specific application domain and the proposed solution, while interesting, does not innovate by proposing more generally appl

Reviewer 03Rating 4Confidence 2

Strengths

1. The paper introduces a well-motivated and technically sound framework that effectively captures hierarchical information in the Limit Order Book through a cascaded Transformer-based representation architecture. 2. It demonstrates consistent and meaningful performance improvements across multiple baseline models, supported by extensive experiments and systematic ablation studies. 3. The work provides valuable insights into how contrastive representation learning can enhance anomaly detection i

Weaknesses

1. The performance improvement is inconsistent, and in several cases the proposed method even leads to performance drops compared with baselines. 2. The novelty is limited; as a task-specific approach that extends standard time-series modeling with an additional hierarchical dimension, the proposed design feels like a natural rather than a fundamentally new solution. 3. The evaluation relies on only one dataset. Although the dataset is sufficiently large, testing on additional datasets would be

Reviewer 04Rating 2Confidence 4

Strengths

- The motivation is well written, and it is easy to know what problem this work tends to solve (detecting multilevel spoofing), making the research direction well-motivated - The paper conducts a comprehensive experimental evaluation across 6 different representation models and 2 downstream detectors, systematically comparing the proposed mode against a baseline - The paper provides thorough ablation studies and sensitivity analyses to validate its components , including the contribution of the

Weaknesses

- The entire evaluation relies on *synthetic manipulations* injected into LOBSTER data. While the procedure is documented, it remains unknown whether these patterns realistically mimic genuine spoofing. No validation or sensitivity analysis is provided comparing synthetic and real manipulative behaviors. This makes empirical conclusions about “state-of-the-art performance” somewhat weak. - The proposed framework largely builds upon existing paradigms, specifically, a stacked autoencoder combined

Reviewer 05Rating 4Confidence 2

Strengths

- The paper structure is clear and easy to follow. - The paper identifies multilevel rather than single-level manipulation.

Weaknesses

- The key novelty is a bit unclear. The overall architecture is well-known. The “cascaded” design is a concatenation of LOB embeddings and manual features. The contrastive component follows Khosla et al. (2020) directly, without any modification for the financial or LOB context, which does not fully support the claim of "novel LOB-based representation learning framework". - The paper would be benefited from further analysis about the interpretation on what features or inter-level interactions m

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Stock Market Forecasting Methods · Time Series Analysis and Forecasting