Evolutionary Feature-wise Thresholding for Binary Representation of NLP Embeddings
Soumen Sinha, Shahryar Rahnamayan, and Azam Asilian Bidgoli

TL;DR
This paper introduces a coordinate search-based method for optimizing feature-specific thresholds to convert NLP embeddings into binary representations, improving accuracy and efficiency in large-scale NLP tasks.
Contribution
It proposes a novel thresholding framework that identifies optimal per-feature thresholds, enhancing binary encoding performance over fixed-threshold methods.
Findings
Binary embeddings with optimized thresholds outperform traditional methods in accuracy.
The approach improves efficiency and accuracy in NLP applications.
The method is versatile and applicable beyond NLP embeddings.
Abstract
Efficient text embedding is crucial for large-scale natural language processing (NLP) applications, where storage and computational efficiency are key concerns. In this paper, we explore how using binary representations (barcodes) instead of real-valued features can be used for NLP embeddings derived from machine learning models such as BERT. Thresholding is a common method for converting continuous embeddings into binary representations, often using a fixed threshold across all features. We propose a Coordinate Search-based optimization framework that instead identifies the optimal threshold for each feature, demonstrating that feature-specific thresholds lead to improved performance in binary encoding. This ensures that the binary representations are both accurate and efficient, enhancing performance across various features. Our optimal barcode representations have shown promising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
