# Analytical Performance Models for NoCs with Multiple Priority Traffic   Classes

**Authors:** Sumit K. Mandal, Raid Ayoub, Michael Kishinevsky, Umit Y. Ogras

arXiv: 1908.02408 · 2020-01-07

## TL;DR

This paper introduces priority-aware analytical performance models for NoCs that incorporate multiple priority classes, enabling faster and accurate latency estimation compared to traditional simulations.

## Contribution

The authors develop novel transformations and an iterative algorithm to accurately model priority-based NoC performance analytically, addressing limitations of existing fair arbitration models.

## Key findings

- Achieves 97% accuracy in latency estimation.
- Provides up to 2.5x speedup over full-system simulation.
- Effectively models multiple priority classes in NoCs.

## Abstract

Networks-on-chip (NoCs) have become the standard for interconnect solutions in industrial designs ranging from client CPUs to many-core chip-multiprocessors. Since NoCs play a vital role in system performance and power consumption, pre-silicon evaluation environments include cycle-accurate NoC simulators. Long simulations increase the execution time of evaluation frameworks, which are already notoriously slow, and prohibit design-space exploration. Existing analytical NoC models, which assume fair arbitration, cannot replace these simulations since industrial NoCs typically employ priority schedulers and multiple priority classes. To address this limitation, we propose a systematic approach to construct priority-aware analytical performance models using micro-architecture specifications and input traffic. Our approach consists of developing two novel transformations of queuing system and designing an algorithm which iteratively uses these two transformations to estimate end-to-end latency. Our approach decomposes the given NoC into individual queues with modified service time to enable accurate and scalable latency computations. Specifically, we introduce novel transformations along with an algorithm that iteratively applies these transformations to decompose the queuing system. Experimental evaluations using real architectures and applications show high accuracy of 97% and up to 2.5x speedup in full-system simulation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.02408/full.md

## Figures

21 figures with captions in the complete paper: https://tomesphere.com/paper/1908.02408/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1908.02408/full.md

---
Source: https://tomesphere.com/paper/1908.02408