Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs
Shreeya Verma Kathuria, Nitin Mayande, Sharookh Daruwalla, Nitin Joglekar, Charles Weber

TL;DR
This paper introduces wSSAS, a deterministic framework that enhances LLM-based text categorization by organizing data hierarchically and prioritizing semantic features, leading to improved accuracy and reproducibility.
Contribution
The paper presents a novel deterministic method, wSSAS, combining hierarchical organization and SNR-based feature prioritization to improve large-scale text categorization with LLMs.
Findings
wSSAS improves clustering integrity across datasets
It significantly reduces categorization entropy
The framework enhances reproducibility of LLM summaries
Abstract
The use of Large Language Models (LLMs) for reliable, enterprise-grade analytics such as text categorization is often hindered by the stochastic nature of attention mechanisms and sensitivity to noise that compromise their analytical precision and reproducibility. To address these technical frictions, this paper introduces the Weighted Syntactic and Semantic Context Assessment Summary (wSSAS), a deterministic framework designed to enforce data integrity on large-scale, chaotic datasets. We propose a two-phased validation framework that first organizes raw text into a hierarchical classification structure containing Themes, Stories, and Clusters. It then leverages a Signal-to-Noise Ratio (SNR) to prioritize high-value semantic features, ensuring the model's attention remains focused on the most representative data points. By incorporating this scoring mechanism into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
