WikiDoMiner: Wikipedia Domain-specific Miner
Saad Ezzini, Sallam Abualhaija, Mehrdad Sabetzadeh

TL;DR
WikiDoMiner is a tool that automatically creates domain-specific corpora by extracting keywords from requirements specifications and querying Wikipedia, aiding requirements engineering tasks with scarce domain data.
Contribution
It introduces WikiDoMiner, a novel open-source tool that generates domain-specific Wikipedia corpora from requirements specifications, addressing the scarcity of domain datasets.
Findings
Successfully generates relevant Wikipedia articles for given domain keywords.
Facilitates requirements engineering tasks like ambiguity handling and classification.
Open-source availability promotes adoption and further research.
Abstract
We introduce WikiDoMiner, a tool for automatically generating domain-specific corpora by crawling Wikipedia. WikiDoMiner helps requirements engineers create an external knowledge resource that is specific to the underlying domain of a given requirements specification (RS). Being able to build such a resource is important since domain-specific datasets are scarce. WikiDoMiner generates a corpus by first extracting a set of domain-specific keywords from a given RS, and then querying Wikipedia for these keywords. The output of WikiDoMiner is a set of Wikipedia articles relevant to the domain of the input RS. Mining Wikipedia for domain-specific knowledge can be beneficial for multiple requirements engineering tasks, e.g., ambiguity handling, requirements classification, and question answering. WikiDoMiner is publicly available on Zenodo under an open-source license (DOI:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Natural Language Processing Techniques · Software Engineering Research
