Token-Level Graphs for Short Text Classification
Gregor Donabauer, Udo Kruschwitz

TL;DR
This paper introduces a token-level graph approach for short text classification that leverages pre-trained language models to capture semantic context, improving performance and efficiency over existing graph-based methods.
Contribution
The proposed method constructs text graphs based on PLM token embeddings, capturing contextual meanings and reducing parameters for better low-resource classification.
Findings
Achieves higher or comparable accuracy to existing methods.
Demonstrates robustness with few training samples.
Provides publicly available implementation for reproducibility.
Abstract
The classification of short texts is a common subtask in Information Retrieval (IR). Recent advances in graph machine learning have led to interest in graph-based approaches for low resource scenarios, showing promise in such settings. However, existing methods face limitations such as not accounting for different meanings of the same words or constraints from transductive approaches. We propose an approach which constructs text graphs entirely based on tokens obtained through pre-trained language models (PLMs). By applying a PLM to tokenize and embed the texts when creating the graph(-nodes), our method captures contextual and semantic information, overcomes vocabulary constraints, and allows for context-dependent word meanings. Our approach also makes classification more efficient with reduced parameters compared to classical PLM fine-tuning, resulting in more robust training with few…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies
