TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models
Hanning Chen, Keyu Man, Kevin Zhu, Chenguang Zhu, Haonan Li, Tongbo Luo, Xizhou Feng, Wei Sun, Sreen Tallam, Mohsen Imani, Partha Kanuparthy

TL;DR
This paper introduces TorchTraceAP, a benchmark dataset and an iterative detection method combining lightweight ML models and LLMs to identify performance anti-patterns in computer vision models' traces, improving detection accuracy.
Contribution
It provides the first benchmark dataset for anti-pattern detection in traces and proposes a novel iterative approach combining ML and LLMs for better detection.
Findings
Our method outperforms unsupervised clustering and rule-based techniques.
It effectively compensates for LLMs' limited context and reasoning.
The dataset includes over 600 diverse PyTorch traces.
Abstract
Identifying and addressing performance anti-patterns in machine learning (ML) models is critical for efficient training and inference, but it typically demands deep expertise spanning system infrastructure, ML models and kernel development. While large tech companies rely on dedicated ML infrastructure engineers to analyze torch traces and benchmarks, such resource-intensive workflows are largely inaccessible to computer vision researchers in general. Among the challenges, pinpointing problematic trace segments within lengthy execution traces remains the most time-consuming task, and is difficult to automate with current ML models, including LLMs. In this work, we present the first benchmark dataset specifically designed to evaluate and improve ML models' ability to detect anti patterns in traces. Our dataset contains over 600 PyTorch traces from diverse computer vision models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
