A centroid based framework for text classification in itsm environments
Hossein Mohanna, Ali Ait-Bachir

TL;DR
This paper introduces a dual-embedding centroid-based text classification framework for hierarchical ITSM support tickets, achieving competitive accuracy with significantly improved training and update speeds, suitable for real-world deployment.
Contribution
The paper presents a novel dual-embedding centroid-based approach that combines semantic and lexical information for hierarchical text classification in ITSM environments, enhancing interpretability and efficiency.
Findings
Achieves hierarchical F1 of 0.731, comparable to SVMs.
Provides 5.9x faster training and 152x faster incremental updates.
Offers significant speedups in batch processing, suitable for production use.
Abstract
Text classification with hierarchical taxonomies is a fundamental requirement in IT Service Management (ITSM) systems, where support tickets must be categorized into tree-structured taxonomies. We present a dual-embedding centroid-based classification framework that maintains separate semantic and lexical centroid representations per category, combining them through reciprocal rank fusion at inference time. The framework achieves performance competitive with Support Vector Machines (hierarchical F1: 0.731 vs 0.727) while providing interpretability through centroid representations. Evaluated on 8,968 ITSM tickets across 123 categories, this method achieves 5.9 times faster training and up to 152 times faster incremental updates. With 8.6-8.8 times speedup across batch sizes (100-1000 samples) when excluding embedding computation. These results make the method suitable for production ITSM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Software System Performance and Reliability · Data Quality and Management
