Identifying Reference Spans: Topic Modeling and Word Embeddings help IR
Luis Moraes, Shahryar Baki, Rakesh Verma, Daniel Lee

TL;DR
This paper explores how topic models and word embeddings can improve the identification of reference spans in scientific documents, advancing the state-of-the-art in citation analysis tasks.
Contribution
It introduces a novel approach combining topic modeling and word embeddings that outperforms previous systems in reference span identification.
Findings
Topic models and word embeddings enhance performance
Achieved surpassing previous best system
Provides insights into citation span detection
Abstract
The CL-SciSumm 2016 shared task introduced an interesting problem: given a document D and a piece of text that cites D, how do we identify the text spans of D being referenced by the piece of text? The shared task provided the first annotated dataset for studying this problem. We present an analysis of our continued work in improving our system's performance on this task. We demonstrate how topic models and word embeddings can be used to surpass the previously best performing system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
