Contextual Categorization Enhancement through LLMs Latent-Space
Zineddine Bettouche, Anas Safi, Andreas Fischer

TL;DR
This paper introduces a method using transformer-based latent space representations to improve semantic categorization in large textual datasets like Wikipedia, employing convex hulls and hierarchical structures to enhance category quality.
Contribution
It presents a novel approach combining transformer encodings with geometric and hierarchical methods to assess and improve semantic category integrity in large datasets.
Findings
Effective semantic filtering using Euclidean distance-based decay functions.
Enhanced identification of outliers and data groupings in Wikipedia categories.
Improved categorization quality through latent space analysis.
Abstract
Managing the semantic quality of the categorization in large textual datasets, such as Wikipedia, presents significant challenges in terms of complexity and cost. In this paper, we propose leveraging transformer models to distill semantic information from texts in the Wikipedia dataset and its associated categories into a latent space. We then explore different approaches based on these encodings to assess and enhance the semantic identity of the categories. Our graphical approach is powered by Convex Hull, while we utilize Hierarchical Navigable Small Worlds (HNSWs) for the hierarchical approach. As a solution to the information loss caused by the dimensionality reduction, we modulate the following mathematical solution: an exponential decay function driven by the Euclidean distances between the high-dimensional encodings of the textual categories. This function represents a filter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Computational Techniques and Applications
MethodsExponential Decay
