TL;DR
This paper investigates why large pre-trained language models like BERT excel in text ranking, revealing that their effectiveness is not primarily due to syntactic understanding but likely due to mechanisms like cross-attention and contextual embeddings.
Contribution
It demonstrates that syntactic information is not crucial for BERT's ranking performance, emphasizing the importance of cross-attention and contextual embeddings instead.
Findings
Syntactic structure disruption does not significantly impair BERT's ranking performance.
Cross-attention mechanisms and rich embeddings are key to BERT's effectiveness.
Term-based methods like BM25 are outperformed by BERT in certain conditions.
Abstract
Even though term-based methods such as BM25 provide strong baselines in ranking, under certain conditions they are dominated by large pre-trained masked language models (MLMs) such as BERT. To date, the source of their effectiveness remains unclear. Is it their ability to truly understand the meaning through modeling syntactic aspects? We answer this by manipulating the input order and position information in a way that destroys the natural sequence order of query and passage and shows that the model still achieves comparable performance. Overall, our results highlight that syntactic aspects do not play a critical role in the effectiveness of re-ranking with BERT. We point to other mechanisms such as query-passage cross-attention and richer embeddings that capture word meanings based on aggregated context regardless of the word order for being the main attributions for its superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Dropout · Linear Warmup With Linear Decay · Weight Decay · Layer Normalization · WordPiece · Softmax
