Reducing Maintenance Burden in Behaviour-Driven Development: A Paraphrase-Robust Duplicate-Step Detector with a 1.1M-Step Open Benchmark

Ali Hassaan Mughal; Noor Fatima; Muhammad Bilal

arXiv:2604.20462·cs.SE·April 28, 2026

Reducing Maintenance Burden in Behaviour-Driven Development: A Paraphrase-Robust Duplicate-Step Detector with a 1.1M-Step Open Benchmark

Ali Hassaan Mughal, Noor Fatima, Muhammad Bilal

PDF

TL;DR

This paper introduces a large, cross-organizational benchmark and a paraphrase-robust detector for identifying duplicate steps in Behaviour-Driven Development (BDD) Gherkin files, aiming to reduce maintenance effort.

Contribution

It provides the largest publicly available BDD step corpus, a new multi-strategy detection method, and a calibration benchmark to improve duplicate detection accuracy.

Findings

01

The detector achieves an F1 score of 0.822 on near-exact duplicates.

02

Semantic detection reaches an F1 of 0.906, outperforming lexical baselines.

03

Approximately 62.5% of step lines are estimated to be eliminable in median repositories.

Abstract

Context. Behaviour-Driven Development (BDD) suites in Gherkin accumulate step-text duplication with documented maintenance cost. Prior detectors either require runnable tests or are single-organisation, leaving a gap: a static, paraphrase-robust, step-level detector and a public benchmark to calibrate it. Objective. We release (i) the largest cross-organisational BDD step corpus to date, (ii) a labelled pair-level calibration benchmark, and (iii) a four-strategy detector with a consolidation-savings model linking clusters to ISO/IEC 25010 maintainability sub-characteristics. Method. The corpus contains 347 public GitHub repositories, 23,667 .feature files, and 1,113,616 Gherkin steps, SPDX-tagged. The detector layers exact hashing, normalised Levenshtein, sentence-transformer cosine, and a Levenshtein-banded hybrid. Calibration uses 1,020 manually labelled step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.