Sim2Real Docs: Domain Randomization for Documents in Natural Scenes using Ray-traced Rendering
Nikhil Maddikunta, Huijun Zhao, Sumit Keswani, Alfy Samuel, Fu-Ming, Guo, Nishan Srishankar, Vishwa Pardeshi, Austin Huang

TL;DR
This paper introduces Sim2Real Docs, a framework that uses ray-traced rendering to generate synthetic datasets of documents in natural scenes for improved machine learning-based document analysis.
Contribution
It presents a novel method for synthesizing realistic document datasets with domain randomization using Blender's ray-traced rendering, addressing data scarcity and variability issues.
Findings
Synthetic datasets improve model robustness in natural scenes
Ray-traced rendering enhances realism of synthetic documents
Framework allows for unlimited task-specific training data
Abstract
In the past, computer vision systems for digitized documents could rely on systematically captured, high-quality scans. Today, transactions involving digital documents are more likely to start as mobile phone photo uploads taken by non-professionals. As such, computer vision for document automation must now account for documents captured in natural scene contexts. An additional challenge is that task objectives for document processing can be highly use-case specific, which makes publicly-available datasets limited in their utility, while manual data labeling is also costly and poorly translates between use cases. To address these issues we created Sim2Real Docs - a framework for synthesizing datasets and performing domain randomization of documents in natural scenes. Sim2Real Docs enables programmatic 3D rendering of documents using Blender, an open source tool for 3D modeling and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · 3D Surveying and Cultural Heritage · Advanced Vision and Imaging
MethodsSoftmax · RoIAlign · RoIPool
