Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Stefan Hadjis; Firas Abuzaid; Ce Zhang; Christopher R\'e

arXiv:1504.04343·cs.LG·May 28, 2015·23 cites

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

Stefan Hadjis, Firas Abuzaid, Ce Zhang, Christopher R\'e

PDF

Open Access 1 Repo

TL;DR

This paper introduces Caffe con Troll, an optimized version of Caffe that significantly improves CNN training throughput on CPUs, enabling efficient hybrid CPU-GPU training by leveraging batching optimizations.

Contribution

We developed Caffe con Troll with internal modifications to enhance performance, demonstrating substantial throughput gains and enabling efficient hybrid CPU-GPU CNN training.

Findings

01

4.5x throughput improvement over Caffe on popular networks

02

End-to-end training time proportional to CPU FLOPS

03

Efficient hybrid CPU-GPU CNN training enabled

Abstract

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 4.5x throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HazyResearch/CaffeConTroll
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Generative Adversarial Networks and Image Synthesis