Identifying Reference Spans: Topic Modeling and Word Embeddings help IR

Luis Moraes; Shahryar Baki; Rakesh Verma; Daniel Lee

arXiv:1708.02989·cs.CL·August 11, 2017

Identifying Reference Spans: Topic Modeling and Word Embeddings help IR

Luis Moraes, Shahryar Baki, Rakesh Verma, Daniel Lee

PDF

TL;DR

This paper explores how topic models and word embeddings can improve the identification of reference spans in scientific documents, advancing the state-of-the-art in citation analysis tasks.

Contribution

It introduces a novel approach combining topic modeling and word embeddings that outperforms previous systems in reference span identification.

Findings

01

Topic models and word embeddings enhance performance

02

Achieved surpassing previous best system

03

Provides insights into citation span detection

Abstract

The CL-SciSumm 2016 shared task introduced an interesting problem: given a document D and a piece of text that cites D, how do we identify the text spans of D being referenced by the piece of text? The shared task provided the first annotated dataset for studying this problem. We present an analysis of our continued work in improving our system's performance on this task. We demonstrate how topic models and word embeddings can be used to surpass the previously best performing system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.