Layout-Aware Representation Learning for Open-Set ID Fraud Discovery
Jinxing Li, Nicholas Ren, Cathy Chang, Hongkai Pan, Daniel George

TL;DR
This paper introduces a layout-aware representation learning approach for open-set ID fraud detection, effectively identifying novel and campaign-scale fraud cases under distribution shifts.
Contribution
It adapts DINOv3 with context-aware fine-tuning and metric learning for layout-aware embeddings, enabling detection of unseen fraud cases beyond closed-set classification.
Findings
Achieves 99.83% layout classification accuracy on Canadian IDs.
Surfaces 276 fraud cases from 20,448 IDs, including 222 previously undetected.
Supports similarity-based expansion from a single seed to related fraud cases.
Abstract
Identity-document fraud detection is not a stationary binary classification problem. Adaptive attackers modify templates and fabrication pipelines, making historical fraud labels stale, and successful forgeries recur at scale as coherent campaigns. We therefore study layout-aware representation learning for open-set fraud discovery rather than only closed-set classification. We adapt DINOv3 to the document domain via context-aware SimMIM fine-tuning and supervised metric learning with composite loss that encourages inter-class separability and intra-class compactness. The model is trained with U.S. IDs only. With a lightweight MLP and softmax classifier, the embedding achieves 99.83% layout classification accuracy on Canadian layouts. Moreover, on a dataset of 20,448 Canadian IDs, embedding-space analysis surfaces 276 adaptive physical-fraud cases, including 222 not surfaced by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
