GLOW : Global Weighted Self-Attention Network for Web Search
Xuan Shan, Chuanjie Liu, Yiqian Xia, Qi Chen, Yusi Zhang, Kaize Ding,, Yaobo Liang, Angen Luo, Yuxiang Luo

TL;DR
GLOW introduces a global weighted self-attention mechanism that incorporates corpus-wide statistics into deep matching models, significantly improving web document retrieval performance over BERT by capturing global importance and whole word attention.
Contribution
The paper proposes GLOW, a novel self-attention network that fuses global corpus knowledge into deep matching models and introduces whole word weight sharing for improved web search relevance.
Findings
GLOW outperforms BERT and baselines on public datasets.
GLOW effectively captures topical and semantic information.
The model maintains BERT's complexity while enhancing performance.
Abstract
Deep matching models aim to facilitate search engines retrieving more relevant documents by mapping queries and documents into semantic vectors in the first-stage retrieval. When leveraging BERT as the deep matching model, the attention score across two words are solely built upon local contextualized word embeddings. It lacks prior global knowledge to distinguish the importance of different words, which has been proved to play a critical role in information retrieval tasks. In addition to this, BERT only performs attention across sub-words tokens which weakens whole word attention representation. We propose a novel Global Weighted Self-Attention (GLOW) network for web document search. GLOW fuses global corpus statistics into the deep matching model. By adding prior weights into attention generation from global information, like BM25, GLOW successfully learns weighted attention scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Normalizing Flows · Invertible 1x1 Convolution · Affine Coupling · Activation Normalization · GLOW · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia?
