FinCPRG: A Bidirectional Generation Pipeline for Hierarchical Queries and Rich Relevance in Financial Chinese Passage Retrieval
Xuan Xu, Beilin Chu, Qinhong Lin, Yixiao Zhong, Fufang Wen, Jiaqi Liu, Binjie Fei, Yu Li, Zhongliang Yang, Linna Zhou

TL;DR
This paper introduces FinCPRG, a bidirectional generation pipeline that creates hierarchical queries and enriches relevance labels for Chinese financial passage retrieval, enhancing dataset quality and retrieval performance.
Contribution
It proposes a novel bidirectional generation pipeline with two query methods and relevance mining, producing a comprehensive dataset for financial Chinese passage retrieval.
Findings
FinCPRG dataset contains 1.3k Chinese financial reports with hierarchical queries.
The pipeline improves relevance annotation quality and diversity.
Experiments show enhanced retrieval performance using FinCPRG.
Abstract
In recent years, large language models (LLMs) have demonstrated significant potential in constructing passage retrieval datasets. However, existing methods still face limitations in expressing cross-doc query needs and controlling annotation quality. To address these issues, this paper proposes a bidirectional generation pipeline, which aims to generate 3-level hierarchical queries for both intra-doc and cross-doc scenarios and mine additional relevance labels on top of direct mapping annotation. The pipeline introduces two query generation methods: bottom-up from single-doc text and top-down from multi-doc titles. The bottom-up method uses LLMs to disassemble and generate structured queries at both sentence-level and passage-level simultaneously from intra-doc passages. The top-down approach incorporates three key financial elements--industry, topic, and time--to divide report titles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Advanced Text Analysis Techniques · Financial Reporting and XBRL
