Synbols: Probing Learning Algorithms with Synthetic Datasets
Alexandre Lacoste, Pau Rodr\'iguez, Fr\'ed\'eric Branchaud-Charron,, Parmida Atighehchian, Massimo Caccia, Issam Laradji, Alexandre Drouin, Matt, Craddock, Laurent Charlin, David V\'azquez

TL;DR
Synbols is a versatile synthetic dataset generator that helps researchers systematically test and analyze the limitations of various machine learning algorithms across multiple learning paradigms.
Contribution
We introduce Synbols, a new tool for creating customizable synthetic datasets to probe and understand learning algorithms' strengths and weaknesses.
Findings
Identified specific failure modes of algorithms using Synbols datasets
Demonstrated the tool's ability to generate diverse and complex data distributions
Showcased insights into algorithm robustness across different learning setups
Abstract
Progress in the field of machine learning has been fueled by the introduction of benchmark datasets pushing the limits of existing algorithms. Enabling the design of datasets to test specific properties and failure modes of learning algorithms is thus a problem of high interest, as it has a direct impact on innovation in the field. In this sense, we introduce Synbols -- Synthetic Symbols -- a tool for rapidly generating new datasets with a rich composition of latent features rendered in low resolution images. Synbols leverages the large amount of symbols available in the Unicode standard and the wide range of artistic font provided by the open font community. Our tool's high-level interface provides a language for rapidly generating new distributions on the latent features, including various types of textures and occlusions. To showcase the versatility of Synbols, we use it to dissect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Handwritten Text Recognition Techniques · Machine Learning and Data Classification
