# Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for   Vision Kernels

**Authors:** Murad Qasaimeh, Kristof Denolf, Jack Lo, Kees Vissers, Joseph Zambreno, and Phillip H. Jones

arXiv: 1906.11879 · 2019-07-01

## TL;DR

This paper benchmarks the energy efficiency and performance of CPU, GPU, and FPGA for embedded vision kernels, revealing FPGA's superior energy efficiency for complex pipelines and GPU's advantage for simple kernels.

## Contribution

It provides a comprehensive comparison of energy and performance metrics for three hardware accelerators using optimized vision libraries, guiding hardware selection for embedded vision applications.

## Key findings

- GPU reduces energy/frame by 1.1-3.2x for simple kernels
- FPGA outperforms others with 1.2-22.3x energy/frame reduction for complex pipelines
- FPGA's efficiency increases with pipeline complexity

## Abstract

Developing high performance embedded vision applications requires balancing run-time performance with energy constraints. Given the mix of hardware accelerators that exist for embedded computer vision (e.g. multi-core CPUs, GPUs, and FPGAs), and their associated vendor optimized vision libraries, it becomes a challenge for developers to navigate this fragmented solution space. To aid with determining which embedded platform is most suitable for their application, we conduct a comprehensive benchmark of the run-time performance and energy efficiency of a wide range of vision kernels. We discuss rationales for why a given underlying hardware architecture innately performs well or poorly based on the characteristics of a range of vision kernel categories. Specifically, our study is performed for three commonly used HW accelerators for embedded vision applications: ARM57 CPU, Jetson TX2 GPU and ZCU102 FPGA, using their vendor optimized vision libraries: OpenCV, VisionWorks and xfOpenCV. Our results show that the GPU achieves an energy/frame reduction ratio of 1.1-3.2x compared to the others for simple kernels. While for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2-22.3x. It is also observed that the FPGA performs increasingly better as a vision application's pipeline complexity grows.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.11879/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1906.11879/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1906.11879/full.md

---
Source: https://tomesphere.com/paper/1906.11879