Chakra: Advancing Performance Benchmarking and Co-design using   Standardized Execution Traces

Srinivas Sridharan; Taekyung Heo; Louis Feng; Zhaodong Wang; Matt; Bergeron; Wenyin Fu; Shengbao Zheng; Brian Coutinho; Saeed Rashidi; Changhai; Man; Tushar Krishna

arXiv:2305.14516·cs.LG·May 29, 2023·2 cites

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Srinivas Sridharan, Taekyung Heo, Louis Feng, Zhaodong Wang, Matt, Bergeron, Wenyin Fu, Shengbao Zheng, Brian Coutinho, Saeed Rashidi, Changhai, Man, Tushar Krishna

PDF

Open Access 2 Repos

TL;DR

Chakra introduces a standardized graph schema for execution traces to improve benchmarking and co-design of AI systems, enabling faster, more flexible, and privacy-preserving simulation and optimization.

Contribution

The paper presents Chakra, a new open schema for workload specification, along with tools for collection, generation, and adoption of execution traces for AI system benchmarking.

Findings

01

Developed Chakra, a standardized execution trace schema.

02

Created AI models to synthesize realistic execution traces.

03

Demonstrated conversion from PyTorch traces to Chakra for simulation.

Abstract

Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different software and hardware stacks especially once systems are fully designed and deployed. However, the pace of AI innovation demands a more agile methodology to benchmark creation and usage by simulators and emulators for future system co-design. We propose Chakra, an open graph schema for standardizing workload specification capturing key operations and dependencies, also known as Execution Trace (ET). In addition, we propose a complementary set of tools/capabilities to enable collection, generation, and adoption of Chakra ETs by a wide range of simulators, emulators, and benchmarks. For instance, we use generative AI models to learn latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Scientific Computing and Data Management · Parallel Computing and Optimization Techniques