Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces
Srinivas Sridharan, Taekyung Heo, Louis Feng, Zhaodong Wang, Matt, Bergeron, Wenyin Fu, Shengbao Zheng, Brian Coutinho, Saeed Rashidi, Changhai, Man, Tushar Krishna

TL;DR
Chakra introduces a standardized graph schema for execution traces to improve benchmarking and co-design of AI systems, enabling faster, more flexible, and privacy-preserving simulation and optimization.
Contribution
The paper presents Chakra, a new open schema for workload specification, along with tools for collection, generation, and adoption of execution traces for AI system benchmarking.
Findings
Developed Chakra, a standardized execution trace schema.
Created AI models to synthesize realistic execution traces.
Demonstrated conversion from PyTorch traces to Chakra for simulation.
Abstract
Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different software and hardware stacks especially once systems are fully designed and deployed. However, the pace of AI innovation demands a more agile methodology to benchmark creation and usage by simulators and emulators for future system co-design. We propose Chakra, an open graph schema for standardizing workload specification capturing key operations and dependencies, also known as Execution Trace (ET). In addition, we propose a complementary set of tools/capabilities to enable collection, generation, and adoption of Chakra ETs by a wide range of simulators, emulators, and benchmarks. For instance, we use generative AI models to learn latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Scientific Computing and Data Management · Parallel Computing and Optimization Techniques
