Transfer Learning with Synthetic Corpora for Spatial Role Labeling and Reasoning
Roshanak Mirzaee, Parisa Kordjamshidi

TL;DR
This paper introduces new synthetic and real-world datasets for spatial language tasks, demonstrating that pretraining with synthetic data enhances model performance, especially with limited target domain data.
Contribution
It provides two novel datasets for spatial question answering and role labeling, and shows synthetic data pretraining improves spatial language model performance.
Findings
Pretraining with synthetic data boosts SOTA results.
Synthetic datasets cover diverse spatial relations.
Performance gains are significant with small target data.
Abstract
Recent research shows synthetic data as a source of supervision helps pretrained language models (PLM) transfer learning to new target tasks/domains. However, this idea is less explored for spatial language. We provide two new data resources on multiple spatial language processing tasks. The first dataset is synthesized for transfer learning on spatial question answering (SQA) and spatial role labeling (SpRL). Compared to previous SQA datasets, we include a larger variety of spatial relation types and spatial expressions. Our data generation process is easily extendable with new spatial expression lexicons. The second one is a real-world SQA dataset with human-generated questions built on an existing corpus with SPRL annotations. This dataset can be used to evaluate spatial language processing models in realistic situations. We show pretraining with automatically generated data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · Geographic Information Systems Studies
