QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture
Shvetank Prakash, Andrew Cheng, Jason Yik, Arya Tschand, Radhika, Ghosal, Ikechukwu Uchendu, Jessica Quaye, Jeffrey Ma, Shreyas Grampurohit,, Sofia Giannuzzi, Arnav Balyan, Fin Amin, Aadya Pipersenia, Yash Choudhary,, Ankita Nayak, Amir Yazdanbakhsh, Vijay Janapa Reddi

TL;DR
QuArch is a new dataset of 1500 question-answer pairs aimed at evaluating and improving AI models' understanding of computer architecture, revealing performance gaps and aiding future research.
Contribution
The paper introduces QuArch, a novel dataset for assessing and enhancing language models' knowledge of computer architecture topics.
Findings
Best closed-source model achieves 84% accuracy
Small open-source model reaches 72% accuracy
Fine-tuning improves small model accuracy by up to 8%
Abstract
We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models' understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles in memory systems, interconnection networks, and benchmarking. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and leaderboard are at https://harvard-edge.github.io/QuArch/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies · Natural Language Processing Techniques
