UniFAR: A Unified Facet-Aware Retrieval Framework for Scientific Documents
Zheng Dou, Zhao Zhang, Deqing Wang, Yikun Ban, Fuzhen Zhuang

TL;DR
UniFAR is a unified retrieval framework that effectively supports both document-to-document and question-driven scientific document retrieval by addressing granularity, semantic focus, and training signal mismatches.
Contribution
It introduces a novel unified architecture that jointly supports doc-doc and q-doc retrieval, aligning document structure with question intent and unifying training signals.
Findings
Outperforms prior methods across multiple retrieval tasks.
Demonstrates effectiveness and generality with various models.
Addresses key mismatches in current SDR approaches.
Abstract
Existing scientific document retrieval (SDR) methods primarily rely on document-centric representations learned from inter-document relationships for document-document (doc-doc) retrieval. However, the rise of LLMs and RAG has shifted SDR toward question-driven retrieval, where documents are retrieved in response to natural-language questions (q-doc). This change has led to systematic mismatches between document-centric models and question-driven retrieval, including (1) input granularity (long documents vs. short questions), (2) semantic focus (scientific discourse structure vs. specific question intent), and (3) training signals (citation-based similarity vs. question-oriented relevance). To this end, we propose UniFAR, a Unified Facet-Aware Retrieval framework to jointly support doc-doc and q-doc SDR within a single architecture. UniFAR reconciles granularity differences through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Biomedical Text Mining and Ontologies
