Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper G\"otting, Pedro Medeiros, Jon G Sanders, Nathaniel Li, Long, Phan, Karam Elabd, Lennart Justen, Dan Hendrycks, Seth Donoughe

TL;DR
The paper introduces VCT, a challenging multimodal benchmark for virology troubleshooting, revealing that advanced LLMs outperform most expert virologists, raising dual-use concerns and governance issues.
Contribution
It presents the VCT benchmark, constructed from expert input, to evaluate LLMs' virology troubleshooting capabilities, highlighting their potential and risks.
Findings
LLMs outperform most expert virologists on VCT.
VCT is highly challenging for even experts.
Publicly available models show dual-use risks.
Abstract
We present the Virology Capabilities Test (VCT), a large language model (LLM) benchmark that measures the capability to troubleshoot complex virology laboratory protocols. Constructed from the inputs of dozens of PhD-level expert virologists, VCT consists of multimodal questions covering fundamental, tacit, and visual knowledge that is essential for practical work in virology laboratories. VCT is difficult: expert virologists with access to the internet score an average of on questions specifically in their sub-areas of expertise. However, the most performant LLM, OpenAI's o3, reaches accuracy, outperforming of expert virologists even within their sub-areas of specialization. The ability to provide expert-level virology troubleshooting is inherently dual-use: it is useful for beneficial research, but it can also be misused. Therefore, the fact that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRespiratory viral infections research · Animal Disease Management and Epidemiology
