# Decoupled Access-Execute on ARM big.LITTLE

**Authors:** Anton Weber, Kim-Anh Tran, Stefanos Kaxiras, Alexandra Jimborean

arXiv: 1701.05478 · 2017-01-20

## TL;DR

This paper investigates how decoupled access-execute (DAE) techniques can optimize energy efficiency and performance on ARM big.LITTLE architectures by intelligently scheduling code phases on different cores.

## Contribution

It introduces a novel application of DAE on ARM big.LITTLE, demonstrating potential energy savings and performance improvements through phase scheduling.

## Key findings

- Up to 37% IPC improvement in execute phase
- More than half of runtime shifted to LITTLE core
- Potential for energy-efficient performance gains

## Abstract

Energy-efficiency plays a significant role given the battery lifetime constraints in embedded systems and hand-held devices. In this work we target the ARM big.LITTLE, a heterogeneous platform that is dominant in the mobile and embedded market, which allows code to run transparently on different microarchitectures with individual energy and performance characteristics. It allows to se more energy efficient cores to conserve power during simple tasks and idle times and switch over to faster, more power hungry cores when performance is needed. This proposal explores the power-savings and the performance gains that can be achieved by utilizing the ARM big.LITTLE core in combination with Decoupled Access-Execute (DAE). DAE is a compiler technique that splits code regions into two distinct phases: a memory-bound Access phase and a compute-bound Execute phase. By scheduling the memory-bound phase on the LITTLE core, and the compute-bound phase on the big core, we conserve energy while caching data from main memory and perform computations at maximum performance. Our preliminary findings show that applying DAE on ARM big.LITTLE has potential. By prefetching data in Access we can achieve an IPC improvement of up to 37% in the Execute phase, and manage to shift more than half of the program runtime to the LITTLE core. We also provide insight into advantages and disadvantages of our approach, present preliminary results and discuss potential solutions to overcome locking overhead.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.05478/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/1701.05478/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1701.05478/full.md

---
Source: https://tomesphere.com/paper/1701.05478