Hierarchical thematic classification of major conference proceedings
Arsentii Kuzmin, Alexander Aduenko, and Vadim Strijov

TL;DR
This paper introduces a hierarchical text classification system that uses a weighted similarity function and Bayesian inference to improve topic relevance ranking in conference proceedings.
Contribution
It presents a novel weighted hierarchical similarity function combined with a Bayesian inference approach for hierarchical text classification.
Findings
Outperforms hierarchical SVM, PLSA, and naive Bayes in ranking accuracy.
Uses entropy-based word importance for weighting.
Provides a closed-form EM algorithm for parameter estimation.
Abstract
In this paper, we develop a decision support system for the hierarchical text classification. We consider text collections with a fixed hierarchical structure of topics given by experts in the form of a tree. The system sorts the topics by relevance to a given document. The experts choose one of the most relevant topics to finish the classification. We propose a weighted hierarchical similarity function to calculate topic relevance. The function calculates the similarity of a document and a tree branch. The weights in this function determine word importance. We use the entropy of words to estimate the weights. The proposed hierarchical similarity function formulates a joint hierarchical thematic classification probability model of the document topics, parameters, and hyperparameters. The variational Bayesian inference gives a closed-form EM algorithm. The EM algorithm estimates the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovation and Knowledge Management · Competitive and Knowledge Intelligence · Regional Economic Development and Innovation
MethodsSupport Vector Machine
