Deterministic vs. Probabilistic Summarisation: An Empirical Trade-off Study in Design Pattern Centric Java Code

Najam Nazar; Christoph Treude

arXiv:2605.21943·cs.SE·May 22, 2026

Deterministic vs. Probabilistic Summarisation: An Empirical Trade-off Study in Design Pattern Centric Java Code

Najam Nazar, Christoph Treude

PDF

TL;DR

This study empirically compares deterministic and probabilistic code summarisation methods in Java, revealing a trade-off between semantic depth and reproducibility.

Contribution

It provides a controlled empirical analysis of these paradigms using design-pattern-centric Java code, highlighting their respective strengths and limitations.

Findings

01

Probabilistic summaries have better semantic alignment and contextual coverage.

02

Deterministic approaches produce more concise and reproducible summaries.

03

Variability exists in LLM outputs, but overall trends are consistent.

Abstract

Background: Automated code summarisation supports program comprehension and documentation, yet the relative strengths and limitations of deterministic (heuristic-based) and probabilistic (LLM-based) pipelines remain unclear. Aims: This paper presents a controlled empirical comparison of these paradigms for intent-oriented design-pattern code summarisation. Method: Using design-pattern-centric Java code as a structured testbed (150 files from three open-source repositories covering nine patterns), we compare a rule-based natural language generation (NLG) pipeline, a Software Word Usage Model (SWUM)-based approach, and a probabilistic pipeline based on the Mixtral LLM. Summaries are evaluated against human references using BERTScore and cosine similarity, complemented by rubric-based judgements produced by Llama 3 across five dimensions: accuracy, conciseness, adequacy, code-context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.