Conical Classification For Computationally Efficient One-Class Topic   Determination

Sameer Khanna

arXiv:2111.00375·cs.AI·November 2, 2021

Conical Classification For Computationally Efficient One-Class Topic Determination

Sameer Khanna

PDF

Open Access

TL;DR

This paper introduces Conical classification, a computationally efficient method for one-class topic detection in large text datasets, improving predictive power and speed over existing methods.

Contribution

The paper proposes Conical classification and Normal Exclusion, novel techniques that enhance efficiency and accuracy in one-class text classification tasks.

Findings

01

Higher predictive accuracy on tested datasets

02

Faster computation compared to existing methods

03

Effective identification of topic-specific documents

Abstract

As the Internet grows in size, so does the amount of text based information that exists. For many application spaces it is paramount to isolate and identify texts that relate to a particular topic. While one-class classification would be ideal for such analysis, there is a relative lack of research regarding efficient approaches with high predictive power. By noting that the range of documents we wish to identify can be represented as positive linear combinations of the Vector Space Model representing our text, we propose Conical classification, an approach that allows us to identify if a document is of a particular topic in a computationally efficient manner. We also propose Normal Exclusion, a modified version of Bi-Normal Separation that makes it more suitable within the one-class classification context. We show in our analysis that our approach not only has higher predictive power…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies · Network Security and Intrusion Detection