A Survey of Code Review Benchmarks and Evaluation Practices in Pre-LLM and LLM Era

Taufiqul Islam Khan; Shaowei Wang; Haoxiang Zhang; and Tse-Hsun Chen

arXiv:2602.13377·cs.SE·February 17, 2026

A Survey of Code Review Benchmarks and Evaluation Practices in Pre-LLM and LLM Era

Taufiqul Islam Khan, Shaowei Wang, Haoxiang Zhang, and Tse-Hsun Chen

PDF

Open Access

TL;DR

This survey comprehensively analyzes 99 code review benchmarks from 2015 to 2025, highlighting trends, limitations, and future directions to improve evaluation practices in both pre-LLM and LLM eras.

Contribution

It provides a systematic taxonomy of code review research, analyzes existing benchmarks, and outlines future directions for more effective evaluation of LLM-based code review tools.

Findings

01

Shift towards end-to-end generative peer review

02

Increase in multilingual code review coverage

03

Decline in standalone change understanding tasks

Abstract

Code review is a critical practice in modern software engineering, helping developers detect defects early, improve code quality, and facilitate knowledge sharing. With the rapid advancement of large language models (LLMs), a growing body of work has explored automated support for code review. However, progress in this area is hindered by the lack of a systematic understanding of existing benchmarks and evaluation practices. Current code review datasets are scattered, vary widely in design, and provide limited insight into what review capabilities are actually being assessed. In this paper, we present a comprehensive survey of code review benchmarks spanning both the Pre-LLM and LLM eras (2015--2025). We analyze 99 research papers (58 Pre-LLM era and 41 LLM era) and extract key metadata, including datasets, evaluation metrics, data sources, and target tasks. Based on this analysis, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Scientific Computing and Data Management · Software Engineering Techniques and Practices