Enriching Social Science Research via Survey Item Linking
Tornike Tsereteli, Daniel Ruffinelli, Simone Paolo Ponzetto

TL;DR
This paper introduces Survey Item Linking (SIL), a two-stage task to automatically connect survey questions mentioned implicitly in social science texts to a knowledge base, improving research referencing and comparison.
Contribution
It defines the SIL task, creates a high-quality dataset for it, and benchmarks deep learning models, highlighting challenges and directions for future improvements.
Findings
SIL is feasible with deep learning models.
Errors mainly occur in mention detection, especially with multi-sentence context.
End-to-end modeling and more diverse data can improve performance.
Abstract
Questions within surveys, called survey items, are used in the social sciences to study latent concepts, such as the factors influencing life satisfaction. Instead of using explicit citations, researchers paraphrase the content of the survey items they use in-text. However, this makes it challenging to find survey items of interest when comparing related work. Automatically parsing and linking these implicit mentions to survey items in a knowledge base can provide more fine-grained references. We model this task, called Survey Item Linking (SIL), in two stages: mention detection and entity disambiguation. Due to an imprecise definition of the task, existing datasets used for evaluating the performance for SIL are too small and of low-quality. We argue that latent concepts and survey item mentions should be differentiated. To this end, we create a high-quality and richly annotated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
MethodsBalanced Selection
