Smaller, Faster, Cheaper: Architectural Designs for Efficient Machine Learning
Steven Walton

TL;DR
This paper explores architectural principles to develop smaller, faster, and more resource-efficient machine learning models, focusing on data handling, modified attention mechanisms, and leveraging normalizing flows.
Contribution
It introduces novel architectural strategies including data ingress/egress optimization, restricted attention in vision transformers, and leveraging normalizing flows for improved efficiency.
Findings
Enhanced data utilization improves model performance.
Restricted attention increases neural expressivity.
Leveraging normalizing flows aids in better model distillation.
Abstract
Major advancements in the capabilities of computer vision models have been primarily fueled by rapid expansion of datasets, model parameters, and computational budgets, leading to ever-increasing demands on computational infrastructure. However, as these models are deployed in increasingly diverse and resource-constrained environments, there is a pressing need for architectures that can deliver high performance while requiring fewer computational resources. This dissertation focuses on architectural principles through which models can achieve increased performance while reducing their computational demands. We discuss strides towards this goal through three directions. First, we focus on data ingress and egress, investigating how information may be passed into and retrieved from our core neural processing units. This ensures that our models make the most of available data, allowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
