Taxonomy-as-a-Service: How To Structure Your Related Work

Mohsen Ahmadvand; Amjad Ibrahim; Felix Huber

arXiv:1906.11217·cs.DL·June 27, 2019

Taxonomy-as-a-Service: How To Structure Your Related Work

Mohsen Ahmadvand, Amjad Ibrahim, Felix Huber

PDF

Open Access 1 Repo

TL;DR

This paper introduces Taxonomy-as-a-Service (TaaS), a platform designed to streamline the creation and maintenance of research taxonomies, making literature review and concept classification more systematic, efficient, and less error-prone.

Contribution

We propose a novel TaaS platform that integrates literature review, taxonomy development, visualization, and analysis to improve the process of structuring related work.

Findings

01

TaaS effectively supports UML-conforming taxonomy creation.

02

The platform enhances efficiency in taxonomy development.

03

It facilitates maintenance and updates of taxonomies.

Abstract

Structuring related work is a daunting task encompassing literature review, classification, comparison (primarily in the form of concepts), and gap analysis. Building taxonomies is a compelling way to structure concepts in the literature yielding reusable and extensible models. However, constructing taxonomies as a product of literature reviews could become, to our experiences, immensely complex and error-prone. Including new literature or addressing errors may cause substantial changes (ripple effects) in taxonomies coping with which requires adequate tools. To this end, we propose a \emph{Taxonomy-as-a-Service (TaaS)} platform. TaaS combines the systematic paper review process with taxonomy development, visualization, and analysis capabilities. We evaluate the effectiveness and efficiency of our platform by employing it in the development of a real-world taxonomy. Our results indicate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mr-ma/paper-review-go
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques

Full text

Taxonomy-as-a-Service: How To Structure Your Related Work

Mohsen Ahmadvand, Amjad Ibrahim, and Felix Huber

Technische Universität München

[email protected]

(2019)

Abstract.

Structuring related work is a daunting task encompassing literature review, classification, comparison (primarily in the form of concepts), and gap analysis. Building taxonomies is a compelling way to structure concepts in the literature yielding reusable and extensible models. However, constructing taxonomies as a product of literature reviews could become, to our experiences, immensely complex and error-prone. Including new literature or addressing errors may cause substantial changes (ripple effects) in taxonomies coping with which requires adequate tools. To this end, we propose a Taxonomy-as-a-Service (TaaS) platform. TaaS combines the systematic paper review process with taxonomy development, visualization, and analysis capabilities. We evaluate the effectiveness and efficiency of our platform by employing it in the development of a real-world taxonomy. Our results indicate that our TaaS can be used to effectively craft and maintain UML-conforming taxonomies and thereby structure related work. The screencast of our tool demonstration is available at https://goo.gl/GsTjsP.

Taxonomy, Ontology Visualization, Research Tools

††copyright: none††journalyear: 2019

1. Introduction

Researchers often produce a taxonomy (ontology)111Although ontologies express more complex relations between concepts than taxonomies, we will use the two terms interchangeably. that abstracts concepts found in the published literature, around a specific topic, and relate them. A taxonomy aids its constructor in coping with the growing amount, and complexity of concepts found in the literature, and hence, facilitates a thorough literature review process. Taxonomies serve as a communication tool supporting the understandability of concepts.

Researchers usually model different views of their research domain in a taxonomy. We refer to each view as a taxonomy dimension. A dimension groups the concepts related to a specific artifact or a perspective on the research topic. For instance, in the security domain, researchers may structure concepts from the attacker or defender perspectives (Ahmadvand et al., 2018), or according to the what and the how aspects of protection methods. A dimension may, also, reflect a process view of the system in which each dimension abstracts a specific phase.

Primarily, a taxonomy is comprised of a set of interrelated concepts. There are two types of relationships among concepts - inter-relations (the relationship among concepts in different dimensions) and intra-relations (relations amongst concepts within an arbitrary dimension). While there exist a wide range of relation types, UML relations seem to support a sufficient set of semantics to express a wide range of taxonomies (Parreiras et al., 2010). Particularly, class diagrams with their built-in relations, viz. association, inheritance, composition, and aggregation are good candidates for modeling taxonomy dimensions (Ahmadvand et al., 2018). Further refined relations are also possible by annotating the given relationships.

Crafting a taxonomy starts with a literature review. Two systematic review methodologies are widely practiced in the research community: SLR (Kitchenham and Charters, 2007) and SMS (Petersen et al., 2008). They are time-consuming, require substantial manual effort, and error-prone. Hence, automating all (or parts) of them is beneficial. Withstanding the differences in the process, literature review in essence supplies concepts in the field and their relations upon which a taxonomy is built. The missing element here is the tool support for crafting taxonomies as the outcome of reviews.

After constructing a taxonomy, researchers analyze it thoroughly and keep on maintaining and evolving it. These activities are strikingly complex and error-prone as the number of concepts and papers increases. Fixing errors such as misclassification, duplicates, and overlooked concepts could render all the previously gathered reports (analyses) obsolete.

Gaps. To the best of our knowledge, the gaps in the literature (see Section 6 for the related work) are: i) there is a lack of adequate tool-support for developing and maintaining taxonomies as a product of SLR or SMS; and ii) the existing tools rather offer limited structural and gap analysis tools, and they do not facilitate the process of correcting and extending taxonomies.

Contributions. Our contributions are manifold: i) elicit requirements for a taxonomy development and maintenance service for crafting UML-like taxonomies; ii) propose an architecture complying with the elicited requirements; iii) develop an interactive visualization tools for crafting and analyzing taxonomies; iv) a thorough evaluation of the tool using a real-world taxonomy; and v) open source the entire tool chain.

2. Requirements

In this section, we elicit the system requirements in accordance with the process proposed in (Zowghi and Coulin, 2005). Space limitation only allows us to list the requirements.

2.1. Functional Requirements

FR1. The system should provide users with a workspace to review papers, create, and update taxonomies.

FR2. Present a mechanism to import the literature to be reviewed. Interfaces to upload different formats of literature (e.g., PDF or DOI) should be supported.

FR3. Facilitate defining, editing, merging, and relating concepts by multiple researchers.

FR4. Support creating a multidimensional visual model of the identified concepts. UML relations of type association, inheritance, and composition shall be supported and distinctly visualized. Annotation is also supported to constitute more specialized relations.

FR5. Enable correlating different concepts, and displaying the literature coverage around them.

FR6. Support clutter-free visualizations of the hierarchy of the concepts via 2D and 3D matrix views with zoom and filtering features.

FR7. Enable mass literature mapping using keyword matching techniques to update existing taxonomies.

2.2. Non-Functional Requirements

NFR1. Scalability, Multi-tenancy, and Deployability. Since the published articles are significantly growing over years (rel, [n.d.]), the system should scale up and down based on the load.

NFR2. Security. Secure accesses to unpublished research artifacts.

NFR3. Fast viewing. Render taxonomy views by keeping caches of highly demanded visualizations.

3. Design

3.1. Architecture

To completely satisfy NFR1, for the architecture of our system, we resort to microservice-based architecture. It also partially addresses security requirements, NFR2, (e.g., isolation and authentication) as we discuss in Section 4.

As depicted in Figure 1, our microservices are

user management, collective literature survey, literature importer, taxonomy builder, analysis engine, and visualization engine.

3.2. Taxonomy Development Process

The process starts by formulating research questions and keywords. It is followed by gathering literature with the specified keywords. The collected articles are then input into the system. As the first step, they are fed into collective review microservice whereby researchers vote on the relevance of the articles to the research questions of interest. During the review process papers are marked with a set of classification tags, which could be imported as (preliminary) concepts to a taxonomy. The taxonomy builder then enables researchers to extend the preliminary classifications further.

Once a taxonomy is crafted, users can utilize the analysis and visualization engines for a thorough analysis, or compile reports. From this point on, using literature importer new papers can be mapped to existing (concepts) taxonomies based on the provided keyword matching techniques in the platform.

3.3. Services

3.3.1. User management

Essentially, this service handles user authentications by issuing access tokens enabling them to interact with other services in the system. This service together with other utilized technologies in the implementation of our services (see Section 4) addresses NFR2.

3.3.2. Collective literature survey

Once researchers gathered the related work (from various sources), they import them into the survey service. The service then allows coworkers (researchers) to conduct a collective review in which they review the abstract of papers and vote to include or exclude them, based on their relevance to the research questions of interest. The approved papers can be fetched at any time by specifying the minimum number of positive votes. Such papers are then analyzed (read) in-depth by individual researchers for final decision makings. Papers can be tagged with arbitrary keywords as well as notes. These keywords could later directly be translated to concepts (in a taxonomy), or be used to derive other concepts. This service in part addresses FR1, FR2, and FR3 requirements.

3.3.3. Taxonomy builder

The builder itself is comprised of three components - inter-dimensional editor, intra-dimension editor, and tag-to-concept importer.

Inter-dimensional editor: This service enables users to create the dimensions of a taxonomy along with their inter-relationships. It is the view in which all the concepts of each dimension and their inter-relations with other dimensions are created and maintained over time. The inter-dimensional view captures a high-level notion of the taxonomy. However, each dimension, specific to a particular aspect of the field, needs to be further developed on its own.

Intra-dimensional editor: In a sense, the intra-dimension service provides a zoomed-in view of a dimension of interest, whereby all the concepts in a dimension are extended with their (sub) concepts and their further instantiations. Relationships between concepts can be defined in the form of UML relations (aggregation, composition, inheritance, and association). All the relations support annotations to capture arbitrary semantics. Moreover, fork and merge features are supported to deal with the potential mistakes that are caused by the collectively gathered tags, which contribute to addressing FR3. In all the operations of the editor, we utilize an eventual cache consistency policy to honor NFR3.

Tag-to-concept importer: Tagged papers throughout the review process can directly be imported into a taxonomy. This service contributes to addressing FR1, FR3, and FR4 requirements.

3.3.4. Literature importer

Using this service one can upload recent/newly discovered literature to update a taxonomy with the latest literature. The service provides four keyword matching methods

regex, dice coefficient, Levenshtein distance (Gomaa and Fahmy, 2013), and fuzzy sort222https://github.com/farzher/fuzzysort for a preliminary mapping of papers to the concepts in a taxonomy (FR7). Researchers can further refine the suggested mappings in the process.

3.3.5. Analysis engine

The two core analyses are the correlation generator and the filtering service. This service contributes to addressing FR5 and FR6 requirements.

3.3.6. Visualization engine

To aid the development and understandability of taxonomies the visualization engine supports three distinct techniques, viz. hierarchy-matrix, 3D, and Crop-circles(Wang and Parsia, 2006) visualizations. The hierarchy-matrix view combines a matrix visualization with a hierarchical tree view of the taxonomy. Every cell in the matrix reports the number of papers that are mapped to both concepts corresponding to x and y-axes. The 3D view extends the matrix view by mapping an arbitrary property (such as the number of citations) to the $z-$ axis.

The Cropcircles visualization offers a clear hierarchy of the concepts that have parent-child relationships grouped by their corresponding dimensions. Therefore, it provides a better understanding of a taxonomies’ topology. Users can zoom into circles to explore related concepts.

All the views offer export as images (in PNG format). The visualization service contributes to addressing FR6 and FR7 requirements.

4. Implementation

The entire platform (written in Go, MySQL, and HTML5) is made open source and is publicly available on Github at https://github.com/mr-ma/paper-review-go.

4.1. Modules

We split each microservice into a set of goal-oriented modules according to the proposed design (see Section 3). Figure 3 captures our modules per microservice.

4.2. Deployment

For the deployment of our services, we utilize container-based approach (one container per microservice). This guarantees a conflict-free service deployment and offers better service isolation. Scaling the system in this setting is as simple as spinning new containers for microservices under stress (NFR1). Figure 4 depicts the deployment diagram of our TaaS.

5. Evaluation

5.1. Case study: software integrity protection taxonomy

As an empirical evaluation, we attempt to craft the already existing software integrity protection taxonomy (Ahmadvand et al., 2018) using our TaaS.

In their publication, the authors present three different views of their taxonomy, viz. a 3-dimensional view with a zoomed-in view of each dimension, a matrix view, and eight correlation views. We were able to plot the three views and correlations successfully. Space limitation hinders enclosing the generated figures as results of these steps.

5.2. Efficiency

To carry out performance measurements, we use a MacBook Pro machine running macOS High Sierra 10.13 64-bit with Intel i5 2.90 GHz CPU and 16 GB of Ram.

We notice that the matrix view incorporates all citations and concepts in a taxonomy and thus it could potentially underperform as the size of the taxonomy grows. All other views perform linearly.

5.2.1. Matrix creation

To identify upper bounds, we measure the elapsed time in the creation of a set of $n\times n$ matrices, where $10<=n<=200$ . We randomly create these matrices initialized with dummy concepts half of which are set to be correlated.

For each value of $n$ we create 10 distinct random matrices, and subsequently, average their creation times yielding one value per each $n$ . The outcome of this experiment is plotted in Figure 5. These results confirm that matrix creation scales linearly in the size of matrices, i.e., $n$ .

5.3. Effectiveness

We use the integrity protection taxonomy as our baseline to evaluate the effectiveness of our keyword matching techniques. As the first step, we remove all the mapped articles on the taxonomy. Then, to compare the conformity of the automated imports to the manual ones, we import the same set of articles using each of the keyword matching techniques.

Throughout the experiment, we set the minimal similarity as constant - $0.9$ for Dice’s coefficient, $1$ for Levenshtein distance, and $-150$ for Fuzzysort. For fairness, we define no synonyms for the taxonomy concepts. In practice, users should use synonyms to further boost the mapping.

In our experiments, we define a parameter as Minimal Occurrence Count ( $MOC$ ). It dictates how many hits of a concept must appear in a paper for it to be mapped to the concept. As depicted in Figure 6, we experiment the conformity results for four values of $MOC$ 10, 5, 3, and 1.

The results of the Levenshtein distance and the Dice’s coefficient techniques have the highest conformity, 78%, and 77% respectively. All of the used string similarity methods seem to perform better than regular expressions (Regex).

6. Related Work

Katifori et al. (Katifori et al., 2007) categorize taxonomy visualization techniques based on the visualization concept to indented list, node-link and tree, zoomable, space-filling, 3D Information landscapes, and Matrix based. A technique can have functionalities from multiple categories. Most of the existing tools are domain-specific and focus on specific aspects and tasks (Lohmann et al., 2016). In contrast, our platform, besides generic visualization, supports review, analysis, and maintenance tasks. Moreover, none of the published techniques in visualization or SLR tools display the complete hierarchy in the matrix, which is crucial for researchers to understand the context of a correlation analysis.

7. Conclusions

Our tool chain automates collective taxonomy creation, maintenance and more importantly analysis. It offers a wide range of tools to aid the identification of research gaps.

We incorporated a set of requirements in the design of our TaaS based on the state of the art and our first-hand experience with developing taxonomies. Our evaluations indicate that our TaaS is both effective and efficient to be used for developing UML-conforming taxonomies.

As per the future work we plan to support Eclipse Modeling Framework (non-UML relationships) models.

8. Availability

Our TaaS is freely available at https://www22.in.tum.de/tools/integrity-taxonomy for the public. The ease of deployability of our platform makes on-premises solutions another alternative. All the source codes are made publicly available on Github at https://github.com/mr-ma/paper-review-go.

Bibliography11

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2rel ([n.d.]) [n.d.]. Elsevier publishing – a look at the numbers, and more. https://www.elsevier.com/connect/elsevier-publishing-a-look-at-the-numbers-and-more Accessed: 2018-04-22.
3Ahmadvand et al . (2018) Mohsen Ahmadvand, Alexander Pretschner, and Florian Kelbert. 2018. A Taxonomy of Software Integrity Protection Techniques. Elsevier. https://doi.org/10.1016/bs.adcom.2017.12.007 · doi ↗
4Gomaa and Fahmy (2013) Wael H Gomaa and Aly A Fahmy. 2013. A survey of text similarity approaches. International Journal of Computer Applications 68, 13 (2013).
5Katifori et al . (2007) Akrivi Katifori, Constantin Halatsis, George Lepouras, Costas Vassilakis, and Eugenia Giannopoulou. 2007. Ontology Visualization Methods – A Survey.
6Kitchenham and Charters (2007) B. Kitchenham and S Charters. 2007. Guidelines for performing Systematic Literature Reviews in Software Engineering.
7Lohmann et al . (2016) Steffen Lohmann, Stefan Negru, Florian Haag, and Thomas Ertl. 2016. Visualizing ontologies with VOWL. Semantic Web 7, 4 (2016), 399–419.
8Parreiras et al . (2010) Fernando Silva Parreiras, Tobias Walter, and Gerd Gröner. 2010. Visualizing Ontologies with UML-like Notation. In Ontology-Driven Software Engineering (O Di SE’10) . ACM, New York, NY, USA, Article 4, 6 pages. https://doi.org/10.1145/1937128.1937132 · doi ↗