Exploring the Vision Processing Unit as Co-processor for Inference
Sergio Rivas-Gomez, Antonio J. Pe\~na, David Moloney, Erwin Laure,, Stefano Markidis

TL;DR
This paper investigates using Vision Processing Units as low-power co-processors for inference tasks, demonstrating comparable performance to CPUs and GPUs with significantly reduced power consumption.
Contribution
It explores the integration of VPUs in HPC for inference, showing their potential to reduce power while maintaining performance.
Findings
Multi-VPU setup achieves similar inference performance to CPU and GPU.
VPU configuration reduces thermal design power (TDP) by up to 8 times.
Preliminary results validate VPUs as energy-efficient inference accelerators.
Abstract
The success of the exascale supercomputer is largely debated to remain dependent on novel breakthroughs in technology that effectively reduce the power consumption and thermal dissipation requirements. In this work, we consider the integration of co-processors in high-performance computing (HPC) to enable low-power, seamless computation offloading of certain operations. In particular, we explore the so-called Vision Processing Unit (VPU), a highly-parallel vector processor with a power envelope of less than 1W. We evaluate this chip during inference using a pre-trained GoogLeNet convolutional network model and a large image dataset from the ImageNet ILSVRC challenge. Preliminary results indicate that a multi-VPU configuration provides similar performance compared to reference CPU and GPU implementations, while reducing the thermal-design power (TDP) up to 8x in comparison.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Advanced Memory and Neural Computing
