SetBERT: Enhancing Retrieval Performance for Boolean Logic and Set Operation Queries
Quan Mai, Susan Gauch, Douglas Adams

TL;DR
SetBERT is a fine-tuned BERT model that significantly improves retrieval for Boolean logic and set operation queries by using a novel inversed-contrastive loss and dataset generated via GPT prompts.
Contribution
This paper introduces SetBERT, a novel BERT-based model with a unique loss function and dataset generation method, enhancing retrieval performance for logic-structured queries.
Findings
SetBERT-base outperforms BERT-base by up to 63% in recall.
SetBERT achieves comparable performance to BERT-large despite smaller size.
Fine-tuning with triplet loss degrades performance for this task.
Abstract
We introduce SetBERT, a fine-tuned BERT-based model designed to enhance query embeddings for set operations and Boolean logic queries, such as Intersection (AND), Difference (NOT), and Union (OR). SetBERT significantly improves retrieval performance for logic-structured queries, an area where both traditional and neural retrieval methods typically underperform. We propose an innovative use of inversed-contrastive loss, focusing on identifying the negative sentence, and fine-tuning BERT with a dataset generated via prompt GPT. Furthermore, we demonstrate that, unlike other BERT-based models, fine-tuning with triplet loss actually degrades performance for this specific task. Our experiments reveal that SetBERT-base not only significantly outperforms BERT-base (up to a 63% improvement in Recall) but also achieves performance comparable to the much larger BERT-large model, despite being…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Machine Learning and Algorithms · Algorithms and Data Compression
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Sparse Evolutionary Training · Cosine Annealing · Linear Warmup With Cosine Annealing · Byte Pair Encoding · WordPiece · Residual Connection · Discriminative Fine-Tuning · Weight Decay · Softmax
