
TL;DR
SAIBench is a unified benchmarking system for scientific AI that uses a domain-specific language to enable flexible, modular evaluation across multiple scientific disciplines.
Contribution
It introduces SAIBench and SAIL, a domain-specific language, to standardize and simplify benchmarking of AI solutions in scientific research.
Findings
SAIBench effectively unifies scientific AI benchmarking.
SAIL enables flexible and reusable benchmarking modules.
The system adapts to various scientific problems and evaluation methods.
Abstract
Scientific research communities are embracing AI-based solutions to target tractable scientific tasks and improve research workflows. However, the development and evaluation of such solutions are scattered across multiple disciplines. We formalize the problem of scientific AI benchmarking, and propose a system called SAIBench in the hope of unifying the efforts and enabling low-friction on-boarding of new disciplines. The system approaches this goal with SAIL, a domain-specific language to decouple research problems, AI models, ranking criteria, and software/hardware configuration into reusable modules. We show that this approach is flexible and can adapt to problems, AI models, and evaluation methods defined in different perspectives. The project homepage is https://www.computercouncil.org/SAIBench
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
