SparseCL: Sparse Contrastive Learning for Contradiction Retrieval
Haike Xu, Zongyu Lin, Yizhou Sun, Kai-Wei Chang, Piotr Indyk

TL;DR
SparseCL introduces a novel sparse contrastive learning method that enhances contradiction retrieval by efficiently capturing subtle contradictory nuances, significantly improving accuracy and speed over existing methods in large-scale document retrieval tasks.
Contribution
The paper proposes SparseCL, a new contrastive learning approach that uses sparse sentence embeddings to better identify contradictions, addressing limitations of similarity search and cross-encoder models.
Findings
Over 30% accuracy improvement on MSMARCO and HotpotQA datasets.
Enhanced contradiction detection speed through reduced vector comparisons.
Effective in cleaning corrupted corpora for high-quality QA retrieval.
Abstract
Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and crossencoder models exhibit significant limitations. The former struggles to capture the essence of contradiction due to its inherent nature of favoring similarity, while the latter suffers from computational inefficiency, especially when the size of corpora is large. To address these challenges, we introduce a novel approach: SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences. Our method utilizes a combined metric of cosine similarity and a sparsity function to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsResidual Connection · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention
