A Brief Introduction to Automatic Differentiation for Machine Learning

Davan Harrison

arXiv:2110.06209·cs.LG·October 18, 2021

A Brief Introduction to Automatic Differentiation for Machine Learning

Davan Harrison

PDF

Open Access

TL;DR

This paper introduces automatic differentiation, a key technique in machine learning frameworks like TensorFlow and PyTorch, explaining its motivations, implementations, and role in enabling efficient gradient-based optimization for neural networks.

Contribution

It provides a comprehensive overview of automatic differentiation, including its motivations, various implementation approaches, and practical examples using popular frameworks.

Findings

01

Automatic differentiation simplifies derivative calculations in neural network training.

02

Different implementation approaches of AD are discussed and compared.

03

Practical examples demonstrate AD's application in TensorFlow and PyTorch.

Abstract

Machine learning and neural network models in particular have been improving the state of the art performance on many artificial intelligence related tasks. Neural network models are typically implemented using frameworks that perform gradient based optimization methods to fit a model to a dataset. These frameworks use a technique of calculating derivatives called automatic differentiation (AD) which removes the burden of performing derivative calculations from the model designer. In this report we describe AD, its motivations, and different implementation approaches. We briefly describe dataflow programming as it relates to AD. Lastly, we present example programs that are implemented with Tensorflow and PyTorch, which are two commonly used AD frameworks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Advanced Data Processing Techniques