Training Efficiency and Robustness in Deep Learning
Fartash Faghri

TL;DR
This paper explores methods to enhance training efficiency and robustness in deep learning, including data prioritization, optimization improvements, and adversarial robustness strategies, with theoretical and practical insights.
Contribution
It introduces novel techniques like hard negative mining, redundancy-aware sampling, and gradient clustering, and provides theoretical analysis of robustness in linear models.
Findings
Prioritizing informative data accelerates convergence and improves generalization.
Hard negative mining adds no computational overhead to training.
Optimal robustness in linear models depends on choice of optimizer, regularization, or architecture.
Abstract
Deep Learning has revolutionized machine learning and artificial intelligence, achieving superhuman performance in several standard benchmarks. It is well-known that deep learning models are inefficient to train; they learn by processing millions of training data multiple times and require powerful computational resources to process large batches of data in parallel at the same time rather than sequentially. Deep learning models also have unexpected failure modes; they can be fooled into misbehaviour, producing unexpectedly incorrect predictions. In this thesis, we study approaches to improve the training efficiency and robustness of deep learning models. In the context of learning visual-semantic embeddings, we find that prioritizing learning on more informative training data increases convergence speed and improves generalization performance on test data. We formalize a simple trick…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
