Identifier Namespaces in Mathematical Notation

Alexey Grigorev

arXiv:1601.03354·cs.IR·January 14, 2016·1 cites

Identifier Namespaces in Mathematical Notation

Alexey Grigorev

PDF

Open Access

TL;DR

This paper proposes a novel method for automatically discovering identifier namespaces in mathematical notation by applying document clustering techniques to group identifiers, validated on source code and Wikipedia datasets.

Contribution

It introduces the first dataset and approach for automatic namespace discovery in mathematical notation, adapting document clustering methods to this new problem.

Findings

01

Partial recovery of namespaces from source code using identifiers

02

Effective extraction of namespaces from Wikipedia articles across languages

03

Hierarchical organization of namespaces using existing classification schemes

Abstract

In this thesis, we look at the problem of assigning each identifier of a document to a namespace. At the moment, there does not exist a special dataset where all identifiers are grouped to namespaces, and therefore we need to create such a dataset ourselves. To do that, we need to find groups of documents that use identifiers in the same way. This can be done with cluster analysis methods. We argue that documents can be represented by the identifiers they contain, and this approach is similar to representing textual information in the Vector Space Model. Because of this, we can apply traditional document clustering techniques for namespace discovery. Because the problem is new, there is no gold standard dataset, and it is hard to evaluate the performance of our method. To overcome it, we first use Java source code as a dataset for our experiments, since it contains the namespace…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Text Analysis Techniques · Web Data Mining and Analysis