Enriching Social Science Research via Survey Item Linking

Tornike Tsereteli; Daniel Ruffinelli; Simone Paolo Ponzetto

arXiv:2412.15831·cs.DL·December 23, 2024

Enriching Social Science Research via Survey Item Linking

Tornike Tsereteli, Daniel Ruffinelli, Simone Paolo Ponzetto

PDF

Open Access 1 Repo

TL;DR

This paper introduces Survey Item Linking (SIL), a two-stage task to automatically connect survey questions mentioned implicitly in social science texts to a knowledge base, improving research referencing and comparison.

Contribution

It defines the SIL task, creates a high-quality dataset for it, and benchmarks deep learning models, highlighting challenges and directions for future improvements.

Findings

01

SIL is feasible with deep learning models.

02

Errors mainly occur in mention detection, especially with multi-sentence context.

03

End-to-end modeling and more diverse data can improve performance.

Abstract

Questions within surveys, called survey items, are used in the social sciences to study latent concepts, such as the factors influencing life satisfaction. Instead of using explicit citations, researchers paraphrase the content of the survey items they use in-text. However, this makes it challenging to find survey items of interest when comparing related work. Automatically parsing and linking these implicit mentions to survey items in a knowledge base can provide more fine-grained references. We model this task, called Survey Item Linking (SIL), in two stages: mention detection and entity disambiguation. Due to an imprecise definition of the task, existing datasets used for evaluating the performance for SIL are too small and of low-quality. We argue that latent concepts and survey item mentions should be differentiated. To this end, we create a high-quality and richly annotated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

e-tornike/sil
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods

MethodsBalanced Selection