GuideWalk: A Novel Graph-Based Word Embedding for Enhanced Text Classification
Sarmad N. Mohammed, Semra G\"und\"u\c{c}

TL;DR
GuideWalk introduces a graph-based word embedding method that leverages sentence structure to improve text classification accuracy and robustness, especially with limited training data.
Contribution
The paper proposes the Guided Transition Probability Matrix (GTPM), a novel graph-based embedding approach that captures syntactic, semantic, and hidden information for enhanced text classification.
Findings
GTPM outperforms existing embedding algorithms in classification tasks.
GTPM maintains high performance with only 10% training data.
The method shows an 8% performance decline with limited data, better than baseline methods.
Abstract
One of the prime problems of computer science and machine learning is to extract information efficiently from large-scale, heterogeneous data. Text data, with its syntax, semantics, and even hidden information content, possesses an exceptional place among the data types in concern. The processing of the text data requires embedding, a method of translating the content of the text to numeric vectors. A correct embedding algorithm is the starting point for obtaining the full information content of the text data. In this work, a new text embedding approach, namely the Guided Transition Probability Matrix (GTPM) model is proposed. The model uses the graph structure of sentences to capture different types of information from text data, such as syntactic, semantic, and hidden content. Using random walks on a weighted word graph, GTPM calculates transition probabilities to derive text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
