# XSP: Across-Stack Profiling and Analysis of Machine Learning Models on   GPUs

**Authors:** Cheng Li, Abdul Dakkak, Jinjun Xiong, Wei Wei, Lingjie Xu, Wen-mei Hwu

arXiv: 1908.06869 · 2020-06-04

## TL;DR

XSP is a comprehensive profiling tool that captures detailed performance data across all hardware and software layers for ML models on GPUs, enabling better optimization and understanding.

## Contribution

The paper introduces XSP, a novel across-stack profiling system that integrates data from multiple sources for holistic analysis of ML model performance.

## Key findings

- XSP accurately measures latencies at all hardware/software levels.
- XSP reveals insights into ML model performance that are hard to obtain with existing tools.
- Automated analysis of 65 ML models demonstrates XSP's effectiveness.

## Abstract

There has been a rapid proliferation of machine learning/deep learning (ML) models and wide adoption of them in many application domains. This has made profiling and characterization of ML model performance an increasingly pressing task for both hardware designers and system providers, as they would like to offer the best possible system to serve ML models with the target latency, throughput, cost, and energy requirements while maximizing resource utilization. Such an endeavor is challenging as the characteristics of an ML model depend on the interplay between the model, framework, system libraries, and the hardware (or the HW/SW stack). Existing profiling tools are disjoint, however, and only focus on profiling within a particular level of the stack, which limits the thoroughness and usefulness of the profiling results.   This paper proposes XSP - an across-stack profiling design that gives a holistic and hierarchical view of ML model execution. XSP leverages distributed tracing to aggregate and correlates profile data from different sources. XSP introduces a leveled and iterative measurement approach that accurately captures the latencies at all levels of the HW/SW stack in spite of the profiling overhead. We couple the profiling design with an automated analysis pipeline to systematically analyze 65 state-of-the-art ML models. We demonstrate that XSP provides insights which would be difficult to discern otherwise.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.06869/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1908.06869/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1908.06869/full.md

---
Source: https://tomesphere.com/paper/1908.06869