Are Deep Sequence Classifiers Good at Non-Trivial Generalization?
Francesco Cazzaro, Ariadna Quattoni, Xavier Carreras

TL;DR
This paper investigates whether deep sequence classifiers can genuinely learn the underlying class distribution in binary, sparse sequence classification tasks, beyond mere data compression, and finds they are capable of proper generalization.
Contribution
The study introduces an evaluation method to distinguish between sequence compression and true generalization in deep sequence classifiers for sparse binary problems.
Findings
Deep models can learn the target class distribution beyond data compression.
Evaluation disentangles compression from true model generalization.
Models demonstrate non-trivial generalization in sparse sequence classification.
Abstract
Recent advances in deep learning models for sequence classification have greatly improved their classification accuracy, specially when large training sets are available. However, several works have suggested that under some settings the predictions made by these models are poorly calibrated. In this work we study binary sequence classification problems and we look at model calibration from a different perspective by asking the question: Are deep learning models capable of learning the underlying target class distribution? We focus on sparse sequence classification, that is problems in which the target class is rare and compare three deep learning sequence classification models. We develop an evaluation that measures how well a classifier is learning the target class distribution. In addition, our evaluation disentangles good performance achieved by mere compression of the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
