BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language?
Zongmeng Zhang, Jinhua Zhu, Wengang Zhou, Xiang Qi, Peng Zhang,, Houqiang Li

TL;DR
This paper evaluates whether current dense retrieval models understand Boolean logic in language, introduces a benchmark dataset, and proposes a training method to improve their comprehension of Boolean operations.
Contribution
It formulates the Boolean Dense Retrieval task, creates the BoolQuestions benchmark dataset, and introduces a contrastive continual training approach to enhance Boolean logic understanding.
Findings
Current dense retrieval models do not fully grasp Boolean logic.
The BoolQuestions dataset enables evaluation of Boolean comprehension.
The proposed training method improves Boolean logic understanding.
Abstract
Dense retrieval, which aims to encode the semantic information of arbitrary text into dense vector representations or embeddings, has emerged as an effective and efficient paradigm for text retrieval, consequently becoming an essential component in various natural language processing systems. These systems typically focus on optimizing the embedding space by attending to the relevance of text pairs, while overlooking the Boolean logic inherent in language, which may not be captured by current training objectives. In this work, we first investigate whether current retrieval systems can comprehend the Boolean logic implied in language. To answer this question, we formulate the task of Boolean Dense Retrieval and collect a benchmark dataset, BoolQuestions, which covers complex queries containing basic Boolean logic and corresponding annotated passages. Through extensive experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms
MethodsFocus
