Entity Set Co-Expansion in StackOverflow

Yu Zhang; Yunyi Zhang; Yucheng Jiang; Martin Michalski; Yu Deng,; Lucian Popa; ChengXiang Zhai; Jiawei Han

arXiv:2212.02271·cs.CL·December 6, 2022

Entity Set Co-Expansion in StackOverflow

Yu Zhang, Yunyi Zhang, Yucheng Jiang, Martin Michalski, Yu Deng,, Lucian Popa, ChengXiang Zhai, Jiawei Han

PDF

Open Access

TL;DR

This paper introduces SECoExpan, a framework for co-expanding multiple entity types in StackOverflow using pre-trained language models, significantly improving entity extraction for software knowledge graph construction.

Contribution

The paper presents a novel co-expansion approach that handles multiple seed entity types simultaneously and leverages PLMs for improved accuracy.

Findings

01

SECoExpan outperforms previous methods significantly.

02

Utilizes PLMs to derive entity embeddings for similarity calculation.

03

Effectively extracts multiple software-related entity types from StackOverflow.

Abstract

Given a few seed entities of a certain type (e.g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds. Entity set expansion in software-related domains such as StackOverflow can benefit many downstream tasks (e.g., software knowledge graph construction) and facilitate better IT operations and service management. Meanwhile, existing approaches are less concerned with two problems: (1) How to deal with multiple types of seed entities simultaneously? (2) How to leverage the power of pre-trained language models (PLMs)? Being aware of these two problems, in this paper, we study the entity set co-expansion task in StackOverflow, which extracts Library, OS, Application, and Language entities from StackOverflow question-answer threads. During the co-expansion process, we use PLMs to derive embeddings of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Quality and Management · Software System Performance and Reliability

Methodstravel james · Lib · Attentive Walk-Aggregating Graph Neural Network