Token-Level Graphs for Short Text Classification

Gregor Donabauer; Udo Kruschwitz

arXiv:2412.12754·cs.IR·December 18, 2024

Token-Level Graphs for Short Text Classification

Gregor Donabauer, Udo Kruschwitz

PDF

Open Access 1 Repo

TL;DR

This paper introduces a token-level graph approach for short text classification that leverages pre-trained language models to capture semantic context, improving performance and efficiency over existing graph-based methods.

Contribution

The proposed method constructs text graphs based on PLM token embeddings, capturing contextual meanings and reducing parameters for better low-resource classification.

Findings

01

Achieves higher or comparable accuracy to existing methods.

02

Demonstrates robustness with few training samples.

03

Provides publicly available implementation for reproducibility.

Abstract

The classification of short texts is a common subtask in Information Retrieval (IR). Recent advances in graph machine learning have led to interest in graph-based approaches for low resource scenarios, showing promise in such settings. However, existing methods face limitations such as not accounting for different meanings of the same words or constraints from transductive approaches. We propose an approach which constructs text graphs entirely based on tokens obtained through pre-trained language models (PLMs). By applying a PLM to tokenize and embed the texts when creating the graph(-nodes), our method captures contextual and semantic information, overcomes vocabulary constraints, and allows for context-dependent word meanings. Our approach also makes classification more efficient with reduced parameters compared to classical PLM fine-tuning, resulting in more robust training with few…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dogregor/tokengraph
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text and Document Classification Technologies