SIDOD: A Synthetic Image Dataset for 3D Object Pose Recognition with   Distractors

Mona Jalal; Josef Spjut; Ben Boudaoud; Margrit Betke

arXiv:2008.05955·cs.CV·August 14, 2020

SIDOD: A Synthetic Image Dataset for 3D Object Pose Recognition with Distractors

Mona Jalal, Josef Spjut, Ben Boudaoud, Margrit Betke

PDF

TL;DR

SIDOD is a comprehensive synthetic image dataset designed for 3D object pose recognition, featuring diverse scenes with distractors, multiple viewpoints, and detailed annotations to advance detection, pose estimation, and tracking research.

Contribution

The paper introduces SIDOD, a large-scale synthetic dataset with randomized scenes, multiple modalities, and distractors, specifically created for improving 3D object pose recognition models.

Findings

01

Dataset contains 144k stereo image pairs with diverse scenes.

02

Includes detailed pixel-level annotations for RGB, depth, segmentation, and surface normals.

03

Facilitates domain randomization for robust model training.

Abstract

We present a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10 objects (chosen randomly from the 21 object models of the YCB dataset [1]) and flying distractors. Object and camera pose, scene lighting, and quantity of objects and distractors were randomized. Each provided view includes RGB, depth, segmentation, and surface normal images, all pixel level. We describe our approach for domain randomization and provide insight into the decisions that produced the dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention · Synthesizer