Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024
Thuy Nguyen Thi, Anh Nguyen Viet, Thin Dang Van, Ngan Nguyen Luu Thuy

TL;DR
This paper presents a three-stage framework using pre-trained language models like BERT, SciBERT, and XLM-R for software mention recognition in scholarly publications, achieving competitive results and ranking third in the SOMD 2024 shared task.
Contribution
The paper introduces a novel three-stage approach leveraging multiple pre-trained models for improved software mention recognition in scientific texts.
Findings
XLM-R-based model achieved a weighted F1-score of 67.80%
Framework outperformed other participating teams
Three-stage approach effectively detects and classifies software mentions
Abstract
This paper describes our systems for the sub-task I in the Software Mention Detection in Scholarly Publications shared-task. We propose three approaches leveraging different pre-trained language models (BERT, SciBERT, and XLM-R) to tackle this challenge. Our bestperforming system addresses the named entity recognition (NER) problem through a three-stage framework. (1) Entity Sentence Classification - classifies sentences containing potential software mentions; (2) Entity Extraction - detects mentions within classified sentences; (3) Entity Type Classification - categorizes detected mentions into specific software types. Experiments on the official dataset demonstrate that our three-stage framework achieves competitive performance, surpassing both other participating teams and our alternative approaches. As a result, our framework based on the XLM-R-based model achieves a weighted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Software System Performance and Reliability
