Topic-Grained Text Representation-based Model for Document Retrieval
Mengxue Du, Shasha Li, Jie Yu, Jun Ma, Bin Ji, Huijun Liu, Wuhang Lin,, Zibo Yi

TL;DR
This paper introduces TGTR, a document retrieval model that uses topic-grained representations to significantly reduce storage needs while maintaining high retrieval accuracy, outperforming traditional word-grained methods.
Contribution
TGTR is the first model to utilize topic-grained representations for document retrieval, reducing storage costs without sacrificing accuracy.
Findings
TGTR requires less than 10% of storage space compared to word-grained baselines.
TGTR achieves comparable retrieval accuracy to word-grained methods.
TGTR outperforms global-grained baselines in accuracy.
Abstract
Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topicgrained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Text and Document Classification Technologies
