Performance tuning for deep learning on a many-core processor (master thesis)
Philippos Papaphilippou

TL;DR
This thesis explores performance optimization techniques for CNN applications on the Loki many-core processor, focusing on code efficiency, portability across configurations, and adaptive algorithms to enhance performance.
Contribution
It introduces optimization strategies tailored for Loki's architecture, evaluates their portability, and investigates adaptive algorithms for improved CNN performance.
Findings
Optimizations significantly improve CNN performance on Loki.
Portability of optimizations varies with configurations and inputs.
Adaptive algorithms offer potential for further performance gains.
Abstract
Convolutional neural networks (CNNs) are becoming very successful and popular for a variety of applications. The Loki many-core processor architecture is very promising for achieving specialised hardware performance and efficiency while being a general purpose solution. Loki combines many simple cores with increased control for the programmer. This freedom can be exploited to produce much more efficient code than in conventional multiprocessors but it also creates a very big design space for possible optimisations. In this project, I explore possible optimisations for a CNN application, their portability on different Loki-specific configurations, convolution parameters and inputs. Finally, I investigate the potential for adaptive algorithms for further performance increase.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Memory and Neural Computing
