CAT-ID$^2$: Category-Tree Integrated Document Identifier Learning for Generative Retrieval In E-commerce
Xiaoyu Liu, Fuwei Zhang, Yiqing Wu, Xinyu Jia, Zenghua Xia, Fuzhen Zhuang, Zhao Zhang, Fei Jiang, Wei Lin

TL;DR
This paper introduces CAT-ID$^2$, a novel method for generating document identifiers in e-commerce that incorporate category information, improving retrieval accuracy and user engagement in generative retrieval systems.
Contribution
The paper proposes a new ID learning approach that integrates category-tree information into semantic IDs, enhancing representational power and retrieval performance.
Findings
Improved document ID quality with category integration.
Enhanced retrieval accuracy demonstrated by online A/B tests.
Increased user engagement metrics in e-commerce search.
Abstract
Generative retrieval (GR) has gained significant attention as an effective paradigm that integrates the capabilities of large language models (LLMs). It generally consists of two stages: constructing discrete semantic identifiers (IDs) for documents and retrieving documents by autoregressively generating ID tokens. The core challenge in GR is how to construct document IDs (DocIDS) with strong representational power. Good IDs should exhibit two key properties: similar documents should have more similar IDs, and each document should maintain a distinct and unique ID. However, most existing methods ignore native category information, which is common and critical in E-commerce. Therefore, we propose a novel ID learning method, CAtegory-Tree Integrated Document IDentifier (CAT-ID), incorporating prior category information into the semantic IDs. CAT-ID includes three key modules: a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Text and Document Classification Technologies
