Scalable Multi-phase Word Embedding Using Conjunctive Propositional Clauses

Ahmed K. Kadhim; Lei Jiao; Rishad Shafik; Ole-Christoffer Granmo; Bimal Bhattarai

arXiv:2501.19018·cs.LG·October 20, 2025

Scalable Multi-phase Word Embedding Using Conjunctive Propositional Clauses

Ahmed K. Kadhim, Lei Jiao, Rishad Shafik, Ole-Christoffer Granmo, Bimal Bhattarai

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a scalable, interpretable two-phase training method for word embeddings using the Tsetlin Machine, improving scalability and maintaining interpretability while achieving competitive results in NLP tasks.

Contribution

The paper presents a novel two-phase training approach for Tsetlin Machine-based word embeddings, addressing scalability issues and enhancing interpretability in NLP applications.

Findings

01

The method achieves competitive performance on benchmark datasets.

02

It maintains interpretability of the embeddings.

03

The approach is effective for sentiment analysis on IMDB.

Abstract

The Tsetlin Machine (TM) architecture has recently demonstrated effectiveness in Machine Learning (ML), particularly within Natural Language Processing (NLP). It has been utilized to construct word embedding using conjunctive propositional clauses, thereby significantly enhancing our understanding and interpretation of machine-derived decisions. The previous approach performed the word embedding over a sequence of input words to consolidate the information into a cohesive and unified representation. However, that approach encounters scalability challenges as the input size increases. In this study, we introduce a novel approach incorporating two-phase training to discover contextual embeddings of input sequences. Specifically, this method encapsulates the knowledge for each input word within the dataset's vocabulary, subsequently constructing embeddings for a sequence of input words…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 2

Strengths

This paper has several notable strengths: 1. It introduces a relatively novel approach by employing the Tsetlin Machine automaton to generate word embeddings, distinguishing it from more conventional methods. 2. The paper provides a thorough explanation of the CoTM architecture, TM automaton, TM-AE training process, and the unique two-phase TM-AE training schema. 3. The authors highlight a computational efficiency feature: by skipping the first phase of TM-AE when updating a single word, one can

Weaknesses

The paper has several areas where it could be improved: 1. Although the model reportedly required 6 months of training on a DGX H100 machine, the paper lacks an analysis of the computational time and space complexity of the proposed method, particularly in comparison to other approaches like GloVe, Word2Vec, and FastText. 2. While the paper emphasizes the scalability and transparency of the two-phase TM-AE approach, further explanation and analysis of these properties would strengthen the argume

Reviewer 02Rating 3Confidence 4

Strengths

A novel tsetlin machine based incorporating two-phase training to discover contextual embeddings of input sequences.

Weaknesses

- The main motivation is not clearly expressed; the introduction does not outline the differences and advantages over previous methods, only presenting an overview of prior work. The CoTM method is also not original, thus the interpretability mentioned at the beginning is also not a contribution of this paper. - The introduction mentions the issue of long training time, but the experiments section does not seem to analyze the efficiency issue. - The experiments are conducted only on a sentimen

Reviewer 03Rating 5Confidence 2

Strengths

- The two-phase approach to word embedding with TM is novel, especially using conjunctive propositional clauses, which adds interpretability. - The use of Tsetlin Machines enables a more transparent and interpretable embedding process compared to deep learning models. - Practical Application in Sentiment Analysis: The embeddings show utility in real-world tasks, as demonstrated through sentiment analysis with data augmentation.

Weaknesses

- The results don’t look good; the model performs poorly on Spearman and Kendall correlations, which suggests it struggles to capture the ranking of word pairs. - I guess Algorithm 1 doesn’t seem to be a direct contribution of this paper, so it might be better left out of the main text or put into Appendix. - The paper could use more polish. For example, Tables 1 and 2 are missing top and bottom lines, which affects readability. - The evaluation is pretty narrow, focusing mostly on basic similar

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems