Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization without Accessing Target Domain Data
Xiangyu Yue, Yang Zhang, Sicheng Zhao, Alberto, Sangiovanni-Vincentelli, Kurt Keutzer, Boqing Gong

TL;DR
This paper introduces a novel simulation-to-real generalization method for semantic segmentation that uses domain randomization and pyramid consistency, achieving state-of-the-art results without target domain data.
Contribution
It presents a new approach combining domain randomization and pyramid consistency to improve generalization in semantic segmentation without target domain access.
Findings
Outperforms existing methods on multiple datasets
Achieves results comparable or better than domain adaptation techniques
Enhances domain-invariant and scale-invariant feature learning
Abstract
We propose to harness the potential of simulation for the semantic segmentation of real-world self-driving scenes in a domain generalization fashion. The segmentation network is trained without any data of target domains and tested on the unseen target domains. To this end, we propose a new approach of domain randomization and pyramid consistency to learn a model with high generalizability. First, we propose to randomize the synthetic images with the styles of real images in terms of visual appearances using auxiliary datasets, in order to effectively learn domain-invariant representations. Second, we further enforce pyramid consistency across different "stylized" images and within an image, in order to learn domain-invariant and scale-invariant features, respectively. Extensive experiments are conducted on the generalization from GTA and SYNTHIA to Cityscapes, BDDS and Mapillary; and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
