Meta-Metrics and Best Practices for System-Level Inference Performance Benchmarking
Shweta Salaria, Zhuoran Liu, Nelson Mimura Gonzalez

TL;DR
The paper introduces FMwork, a structured approach for benchmarking inference performance of large models, emphasizing meta-metrics and strategic testing to optimize accuracy and resource use.
Contribution
It presents FMwork, a novel framework that systematically guides performance benchmarking with meta-metrics and best practices, improving efficiency and accuracy.
Findings
Up to 24x speedup or resource savings in benchmarking.
Reducing experiment size from 1024 to 128 tokens yields 2.7x gain.
Maintains 96.6% accuracy with fewer experiments.
Abstract
Benchmarking inference performance (speed) of Foundation Models such as Large Language Models (LLM) involves navigating a vast experimental landscape to understand the complex interactions between hardware and software components. However, evaluating every possible test configuration is impractical, unfeasible and unnecessary. To address this challenge, we introduce FMwork, a comprehensive and methodical approach to creating a controlled testing environment that accurately reflects and characterizes performance. FMwork comprises a set of benchmkaring best practices with three key components: 1) meta-metrics, 2) parameter selection, and 3) strategic cost-performance evaluation. Meta-metrics account for time and resources spent on benchmarking and the relative accuracy of the results compared to a larger body of measurements, representing the complete experimental space. FMwork…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
