Evolutionary Feature-wise Thresholding for Binary Representation of NLP Embeddings

Soumen Sinha; Shahryar Rahnamayan; and Azam Asilian Bidgoli

arXiv:2507.17025·cs.CL·July 24, 2025

Evolutionary Feature-wise Thresholding for Binary Representation of NLP Embeddings

Soumen Sinha, Shahryar Rahnamayan, and Azam Asilian Bidgoli

PDF

Open Access

TL;DR

This paper introduces a coordinate search-based method for optimizing feature-specific thresholds to convert NLP embeddings into binary representations, improving accuracy and efficiency in large-scale NLP tasks.

Contribution

It proposes a novel thresholding framework that identifies optimal per-feature thresholds, enhancing binary encoding performance over fixed-threshold methods.

Findings

01

Binary embeddings with optimized thresholds outperform traditional methods in accuracy.

02

The approach improves efficiency and accuracy in NLP applications.

03

The method is versatile and applicable beyond NLP embeddings.

Abstract

Efficient text embedding is crucial for large-scale natural language processing (NLP) applications, where storage and computational efficiency are key concerns. In this paper, we explore how using binary representations (barcodes) instead of real-valued features can be used for NLP embeddings derived from machine learning models such as BERT. Thresholding is a common method for converting continuous embeddings into binary representations, often using a fixed threshold across all features. We propose a Coordinate Search-based optimization framework that instead identifies the optimal threshold for each feature, demonstrating that feature-specific thresholds lead to improved performance in binary encoding. This ensures that the binary representations are both accurate and efficient, enhancing performance across various features. Our optimal barcode representations have shown promising…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification