Where are the Hidden Gems? Applying Transformer Models for Design Discussion Detection
Lawrence Arkoh, Daniel Feitosa, Wesley K. G. Assun\c{c}\~ao

TL;DR
This paper evaluates transformer-based models like BERT, RoBERTa, XLNet, LaMini-Flan-T5, and ChatGPT-4o-mini for identifying design discussions in software engineering, demonstrating their strengths and limitations across different metrics and datasets.
Contribution
It extends prior cross-domain detection studies by incorporating modern transformer architectures and methodological improvements, providing a comprehensive comparison of their effectiveness.
Findings
BERT and RoBERTa have strong recall across domains.
XLNet achieves higher precision but lower recall.
ChatGPT-4o-mini yields the highest recall and competitive performance.
Abstract
Design decisions are at the core of software engineering and appear in Q\&A forums, mailing lists, pull requests, issue trackers, and commit messages. Design discussions spanning a project's history provide valuable information for informed decision-making, such as refactoring and software modernization. Machine learning techniques have been used to detect design decisions in natural language discussions; however, their effectiveness is limited by the scarcity of labeled data and the high cost of annotation. Prior work adopted cross-domain strategies with traditional classifiers, training on one domain and testing on another. Despite their success, transformer-based models, which often outperform traditional methods, remain largely unexplored in this setting. The goal of this work is to investigate the performance of transformer-based models (i.e., BERT, RoBERTa, XLNet,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Topic Modeling
