ALiiCE: Evaluating Positional Fine-grained Citation Generation
Yilong Xu, Jinhua Gao, Xiaoming Yu, Baolong Bi, Huawei Shen, Xueqi Cheng

TL;DR
This paper introduces ALiiCE, an automatic evaluation framework for assessing the quality of positional fine-grained citation generation in language models, addressing a gap in existing sentence-level citation research.
Contribution
ALiiCE is the first framework to evaluate positional fine-grained citations using dependency parsing and novel metrics, advancing the assessment of citation quality in language models.
Findings
ALiiCE effectively evaluates citation quality in long-form QA datasets.
Positional citation metrics correlate with citation accuracy.
Analysis reveals current LLMs' strengths and weaknesses in citation generation.
Abstract
Large Language Model (LLM) can enhance its credibility and verifiability by generating text with citations. However, existing research on citation generation is predominantly limited to sentence-level statements, neglecting the significance of positional fine-grained citations that can appear anywhere within sentences. To facilitate further exploration of the positional fine-grained citation generation, we propose ALiiCE, the first automatic evaluation framework for this task. Our method employs a dependency tree based approach to parse the sentence-level claim into atomic claims. Then ALiiCE evaluates citation quality using three metrics, including positional fine-grained citation recall, precision, and coefficient of variation of citation positions. We evaluate the positional fine-grained citation generation performance of several LLMs on long-form QA datasets. Our experiments and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBiomedical Text Mining and Ontologies
