TL;DR
This paper introduces PTM4Tag, a novel framework using pre-trained language models with a triplet architecture to improve tag recommendation accuracy for Stack Overflow posts, addressing noise and redundancy issues.
Contribution
It is the first to leverage pre-trained language models specifically for tag recommendation in Stack Overflow, demonstrating superior performance over existing deep learning methods.
Findings
CodeBERT achieves the best performance among tested PTMs.
Using all post components yields the highest accuracy.
Title is the most influential component for tag prediction.
Abstract
Stack Overflow is often viewed as the most influential Software Question Answer (SQA) website with millions of programming-related questions and answers. Tags play a critical role in efficiently structuring the contents in Stack Overflow and are vital to support a range of site operations, e.g., querying relevant contents. Poorly selected tags often introduce extra noise and redundancy, which leads to tag synonym and tag explosion problems. Thus, an automated tag recommendation technique that can accurately recommend high-quality tags is desired to alleviate the problems mentioned above. Inspired by the recent success of pre-trained language models (PTMs) in natural language processing (NLP), we present PTM4Tag, a tag recommendation framework for Stack Overflow posts that utilize PTMs with a triplet architecture, which models the components of a post, i.e., Title, Description, and Code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
