Layout-Aware Representation Learning for Open-Set ID Fraud Discovery

Jinxing Li; Nicholas Ren; Cathy Chang; Hongkai Pan; Daniel George

arXiv:2605.05215·cs.CV·May 8, 2026

Layout-Aware Representation Learning for Open-Set ID Fraud Discovery

Jinxing Li, Nicholas Ren, Cathy Chang, Hongkai Pan, Daniel George

PDF

TL;DR

This paper introduces a layout-aware representation learning approach for open-set ID fraud detection, effectively identifying novel and campaign-scale fraud cases under distribution shifts.

Contribution

It adapts DINOv3 with context-aware fine-tuning and metric learning for layout-aware embeddings, enabling detection of unseen fraud cases beyond closed-set classification.

Findings

01

Achieves 99.83% layout classification accuracy on Canadian IDs.

02

Surfaces 276 fraud cases from 20,448 IDs, including 222 previously undetected.

03

Supports similarity-based expansion from a single seed to related fraud cases.

Abstract

Identity-document fraud detection is not a stationary binary classification problem. Adaptive attackers modify templates and fabrication pipelines, making historical fraud labels stale, and successful forgeries recur at scale as coherent campaigns. We therefore study layout-aware representation learning for open-set fraud discovery rather than only closed-set classification. We adapt DINOv3 to the document domain via context-aware SimMIM fine-tuning and supervised metric learning with composite loss that encourages inter-class separability and intra-class compactness. The model is trained with U.S. IDs only. With a lightweight MLP and softmax classifier, the embedding achieves 99.83% layout classification accuracy on Canadian layouts. Moreover, on a dataset of 20,448 Canadian IDs, embedding-space analysis surfaces 276 adaptive physical-fraud cases, including 222 not surfaced by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.