HyReC: Exploring Hybrid-based Retriever for Chinese
Zunran Wang, Zheng Shenpeng, Wang Shenglan, Minghui Zhao, Zhonghua Li

TL;DR
HyReC is a novel end-to-end method for hybrid Chinese retrieval that combines semantic union, a global-local-aware encoder, and normalization to improve performance and semantic consistency.
Contribution
It introduces HyReC, an innovative hybrid retrieval model with a GLAE and normalization modules, specifically designed for Chinese retrieval tasks.
Findings
HyReC outperforms existing methods on C-MTEB benchmark.
Semantic union enhances retrieval accuracy.
Glae and NM improve semantic sharing and alignment.
Abstract
Hybrid-based retrieval methods, which unify dense-vector and lexicon-based retrieval, have garnered considerable attention in the industry due to performance enhancement. However, despite their promising results, the application of these hybrid paradigms in Chinese retrieval contexts has remained largely underexplored. In this paper, we introduce HyReC, an innovative end-to-end optimization method tailored specifically for hybrid-based retrieval in Chinese. HyReC enhances performance by integrating the semantic union of terms into the representation model. Additionally, it features the Global-Local-Aware Encoder (GLAE) to promote consistent semantic sharing between lexicon-based and dense retrieval while minimizing the interference between them. To further refine alignment, we incorporate a Normalization Module (NM) that fosters mutual benefits between the retrieval approaches. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Image Retrieval and Classification Techniques
