DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning
Alex Tamkin, Vincent Liu, Rongfei Lu, Daniel Fein, Colin Schultz, Noah, Goodman

TL;DR
DABS introduces a comprehensive benchmark to evaluate self-supervised learning algorithms across diverse domains, highlighting the need for more domain-agnostic methods to advance the field.
Contribution
The paper presents DABS, a new benchmark with seven diverse domains for evaluating domain-agnostic self-supervised learning algorithms, along with baseline methods e-Mix and ShED.
Findings
Baseline algorithms perform modestly across domains
Significant progress is needed for domain-agnostic self-supervised learning
The benchmark facilitates evaluation of algorithms in diverse real-world settings
Abstract
Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing. However, these algorithms are domain-specific, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientific, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a Domain-Agnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsAttention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · 1x1 Convolution · Max Pooling · Average Pooling · Attention Dropout · Adam · Dropout · Residual Block
